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! Bureau of the Census of the U.S. Department of Commerce has long 
an interest in developing postcensal estimates of population for areas 
iller than states. Until the 1970s, its population estimation program 
limited to counties, large cities, and metropolitan areas. During the 
Os, however, the Census Bureau undertook the major task of making 
mates of population and per capita income for some 39,000 general 
pose local jurisdictions. This undertaking was stimulated by the State 
Local Fiscal Assistance Act of 1972 (P.L. 92-512), commonly referred 
s general revenue sharing (grs), which requires that the most recently 
liable data provided by the Census Bureau be used to determine the 
cation of grs funds among the states and approximately 39,000 eligi- 
units of local government — counties and subcounty areas. 

"he Census Bureau’s program of estimates are important not only 
ause large amounts of federal funds are allocated directly on the basis 
hose estimates but also because population estimates are basic to other 
isures, such as current vital rates. Planners and decision makers at the 
;e and local levels also rely heavily on the small-area estimates, 
it the request of the Census Bureau and the Office of Revenue Sharing 
the U.S. Department of the Treasury, the Committee on National 
tistics in July 1978 established the Panel on Small-Area Estimates of 
)ulation and Income. The Panel included persons with expertise in the 
as of statistics, demography, and economics. (Biographical sketches of 
lel members appear in Appendix L.) 

"he Panel was charged with the general task of evaluating the Census 
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Bureau’s procedures for making postcensal estimates of population and 
per capita income for local areas. More specifically, the Panel was asked 
to review methods currently used and possible alternate methods, review 
data sources currently used and possible alternate sources, and assess 
levels of accuracy of current estimates in light of the uses made of them 
and of the effects of potential errors on these uses. 

In carrying out its task, the Panel was asked to develop its recommen- 
dations in the light of the 5-year schedule for future censuses and available 
information on the census undercount; consider criteria for choosing 
among data sources and techniques — for example, the importance of 
uniformity and consistency in order to treat different localities equitably — 
and the standards of accuracy required for places of different sizes; con- 
sider the error structure inherent in the estimates, how estimates of error 
might be prepared, and how (if at all) such estimates might be conveyed to 
users; and consider the appropriate role for State agencies in cooperating 
in the estimating process. 

Because a complete description of the detailed procedures used by the 
Bureau to prepare the estimates of population and income was not 
available in written form, the first task undertaken by the project staff was 
the preparation of Appendix A, “Postcensal Population Estimation 
Methods of the Census Bureau.” Although the authors spent a con- 
siderable part of 2 months at the Census Bureau preparing this appendix, 
it is not an official report of the Census Bureau, and some of the minute 
details of the procedures (which are based on written census reports, sup- 
plemented by discussions with census staff) may not, despite the authors’ 
efforts, be described exactly as actually carried out during the 1970s. A 
similar qualification applies to Appendix B (the summary of income esti- 
mation methodology) and to the statements in Chapter 1 concerning the 
rationale of the methodology, the criteria of accuracy used, and the 
reasons for the methodological decisions made by the Census Bureau. 

The Panel acknowledges with gratitude the assistance received from 
many individuals who cooperated in the study: Meyer Zitter, Roger Her- 
riot, Richard Engels, Mary Kay Healy, and Robert Fay of the Census 
Bureau; Matthew Butler, Kent Peterson, and Jack McGuire of the Office 
of Revenue Sharing; Joseph Duncan of the Office of Federal Statistical 
Policy and Standards and Edwin Colemen of the Bureau of Economic 
Analysis of the U.S. Department of Commerce consulted with members 
and staff on several occasions. Many other members of the Census Bureau 
provided assistance, and special thanks are due David Word, Frederick 
Cavanaugh, David Galdi, Joseph Knott, Sharon Baucom, Jerome Glynn, 
Richard Irwin, Jennifer Marks, Edward Hanlon, Marianne Roberts, Bar- 
bara van der Vate, Louisa Miller, Frances Barnett, Joel Miller, and Mar- 



reau of Health Planning, U.S. Department of Health, Education, and 
Ifare, were helpful in explaining uses of the postcensal estimates by 
ir respective agencies. 

Members and staff of the Committee on National Statistics provided 
ice at many phases of the Panel’s work. Margaret Martin, past ex- 
tive director, Edwin Goldfield, executive director, and Miron Straf, 
jarch director, were generous with support, criticism, and guidance, 
tbove all, the Panel wishes to acknowledge the major contribution of 
project staff. Bruce Spencer, study director, had overall responsibility 
coordinating the work of the Panel and the staff, and he made impor- 
t contributions to every phase of the study. He provided the working 
terials for the Panel, organized its meetings, prepared many of the 
kground papers that served as the basis for our discussions, and was 
jely responsible for drafting this report. Che-Fu Lee and the late Walt 
Simmons also contributed to parts of the project. Linda Jones was 
retary for nearly all of the Panel’s duration. We also acknowledge the 
erb editing skills of Jean Savage, Elaine McGarraugh, and, especially, 
lie Grohman. 

'inally, I wish to thank the members of the Panel for their willingness 
contribute their time and specialized knowledge to the tasks assigned 
A number of Panel members prepared background papers for our 
cussions. Some of their contributions appear in the appendices; others 
e been incorporated in the text of the report. This report represents the 
isensus of the Panel on the issues addressed. Needless to say, 
i^ever, no individual member of the Panel should or would want to be 
d responsible for every detail or point of view expressed. 

ELYN M. KITAGAWA, Chairman 

lel on Small- Area Estimates of Population and Income 







Overview 

and 

Recommendations 


INTRODUCTION 

L BACKGROUND 

decennial census provides counts of the enumerated population’ and 
nates of per capita income for detailed geographic areas of the United 
;es at 10-year intervals. In the years following a census this information 
)mes outdated as the population and per capita income of areas change, 
objective of the Census Bureau’s postcensal estimation program is to 
ate the census information on population and per capita income for ap- 
cimately 39,000 general purpose governmental units, more than half of 
:h have populations of less than 1,000. 

he preparation of postcensal estimates for those many small areas was 
npted by the State and Local Fiscal Assistance Act of 1972 (P.L. 
il2), which required that the most recently available data on popula- 
and per capita income provided by the Census Bureau be used in the 
lulas that determine the annual (or biannual) allocation of general 
nue sharing (grs) funds among 39,000 eligible units of government, 
iddition to their use in determining the allocation of grs and other 


lough the population counts derived from the decennial census enumerations are 
ned to be complete and accurate, they are known to contain errors of omission, duplica- 
and misclassification. The Census Bureau publishes estimates of the net undercount of 
opulation by sex, race, and age for the United States as a whole (Bureau of the Census, 
a). 
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federal assistance funds— a total of more than $36 billion per year— these 
estimates also serve a wide variety of needs of state and local governments, 
private organizations, and scholarly research (see section 1.1b). 

Although the Census Bureau had been working on methods for estimat- 
ing the population of states and large counties and cities since the 1940s and 
had published its first series of estimates for all counties in the United States 
in 1966, the methodology for small areas was in the early developmental 
stages when the 1972 act was drafted. In a hearing before the Ways and 
Means Committee, Census Bureau officials stated that the methodology for 
producing estimates for small local areas was not yet developed and tested 
and could be very inaccurate for places of population under 50,000 (U.S. 
Congress, 1972). In drafting the legislation. Congress did not require that 
postcensal estimates be produced regardless of accuracy but only that the 
most recent data provided by the Census Bureau be used for general 
revenue sharing allocations. Especially for population estimates at the sub- 
county level and for estimates of per capita income, the methodology cur- 
rently used by the Census Bureau to make postcensal estimates was 
developed to a great extent after the grs law was enacted. 

The Panel’s review of the postcensal estimation program of the Census 
Bureau has included an examination of the logic and the accuracy of the 
methods used to derive estimates of population and per capita income. Our 
tentative assessment of their accuracy is based primarily on comparisons of 
the postcensal estimates with the results of special censuses carried out dur- 
ing the 1970s. More conclusive evaluation awaits comparison of the 
estimates with the results of the 1980 decennial census. 

Although the postcensal estimation program produces estimates of total 
population and per capita income for approximately 39,000 areas, the 
estimation methodology is designed to measure the change in total popula- 
tion and per capita income of each area since the last national census 
enumeration. The estimates of change for each area are applied to its 
population and per capita income as determined in the last census. Thus 
the implicit objective of the methodology is the estimation of postcensal 
change in population and per capita income, and the Panel has evaluated 
the methodology from this perspective as well as in terms of the accuracy 
of the estimates of total population and per capita income. The accuracy 
of estimates of postcensal change is of critical importance when the 
postcensal estimates are used to calculate the allocation of general revenue 
sharing funds, because it is changes (since the last census) in population 
and per capita income of areas that produce changes in the allocation of 
funds. 

The Panel has not addressed the question of whether or not the post- 
censal estimates of population should be adjusted for census undercount 



:us data and postcensal estimates must be consistent in this respect), 
this was recently considered by another panel of the National 
2 arch Council (1978). Second, if a decision were made to adjust the 
ulation estimates for census undercount, essentially the same postcen- 
methodology currently used by the Census Bureau could be used to 
nate postcensal change; the major difference in procedure would be 
the reported data from the last census would be adjusted for census 
ercount before being added to the estimates of postcensal change. 

) NEEDS FOR POSTCENSAL ESTIMATES 

Census Bureau currently produces postcensal estimates of population 
income for approximately 39,000 general purpose governmental units 
are eligible for general revenue sharing funds. Table 1.1 shows the 
ibers of county and municipal and township governments and their 
nated population, classified by size. The overwhelming majority of 
e areas have very small populations. For example, 85 percent of the 
)84 municipalities and townships had less than 5,000 population in 
5, 54 percent had less than 1,000 population, and 36 percent had less 
1 500 population. 

he postcensal estimates of population and income for those areas are 
i in a variety of activities, including the allocation of federal funds, 
lie and private planning and decision making, determining the 
ibility of a locality for self-government, and scholarly research. The 
ortance of the estimates in determining the allocations made under 
:ral grant programs has been stressed both in the professional litera- 
I and in the courts. More than 100 programs make allocations partly or 
ly on the basis of population estimates (U.S. Congress, 1978). In fiscal 
5, nearly $36 billion was distributed under the 10 largest grant pro- 
ns that use population and income data to determine allocations (Of- 
of Federal Statistical Policy and Standards, 1978). The general 
;nue sharing program alone (P.L. 95-512) distributes more than $6 
on a year. 

1 some instances the postcensal estimates are used to decide whether a 
:e is eligible to receive benefits of one kind or another. The eligibility 
may be whether the estimate for the place has exceeded a threshold 
le. For example, to receive funds under Title I of the Comprehensive 
ployment and Training Act (ceta) programs (P.L. 93-203), an area 
:t have (or be part of a consortium that has) a population of at least 
,000. To receive funds under Title II of ceta, an area must have a 
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Note; Because of rounding, population detail may not add to total. The total population of counties (189,691) is lower than the total U.S. population 
because the 3,042 county governments considered exclude 106 county-type areas that do not possess independently organized county governments. 
These areas include all of Connecticut, Rhode Island, the District of Columbia, New York City, Philadelphia, and San Francisco, among others. 

sources: Bureau of the Census (1978a, Tables B, C, and E) and unpublished data from the Bureau of the Census. 
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ilation of at least 50,000. Similarly, to qualify for receiving funds 
IT the Community Development Block Grant program (P.L. 93-383), 
rea must have a population of at least 50,000. Thresholds can also be 
le form of limits: to be eligible to receive funds for state and regional 
1 \vaste plans from the Rural Communities Assistance Program, a 
icipality or county must not be larger in population than 5,000 or 
00, respectively. 

iresholds apply to activities other than fund allocation. In some states 
:ality cannot become self-governing unless its population, as deter- 
jd by a census or postcensal estimate, exceeds a fixed level. Popula- 
size also determines how a community is classified by the state 
rnment — as a class 1, class 2, class 3, or other kind of city — which 
nits the powers, duties, and obligations of the local governmental 
5. In general, class 1 cities exercise more self-government, have 
der taxing powers, and may provide more services than class 2 or 
: 3 cities. In some states a city’s classification also determines such 
ngs as the maximum salaries for public officials and the maximum 
ber of establishments permitted to sell liquor. ^ 

!any other measures, including employment, unemployment, and 
1 and death rates, depend implicitly on the postcensal population 
nates. Like the postcensal population estimates, these measures are 
i not only to determine fund allocations but also to identify and 
yze problems, to formulate policies to ameliorate the problems, and 
/aluate the effects of the adopted policies. These measures are also 
1 in basic scholarly research to formulate and test theories. (Appendix 
ustrates the way postcensal population estimates are used in com- 
ng official measures of employment and unemployment.) 
anners and decision makers in the private and public sectors rely on 
postcensal estimates to evaluate current population trends. Data on 
3 trends are especially useful for heterogeneous regions in which some 
I areas are gaining and others are losing population. For example, the 
Is of plans and decisions that need to be made about education, 
th, police, and sanitation services differ importantly for growing and 
ining areas. Particular use is made of the estimates by the more than 
health systems agencies (hsa’s) to develop health plans and review 
)Osed health programs. The postcensal estimates also play a role in the 
rmination of amounts of funds allocated to each hsa under the Na- 
al Health Planning and Resources Development Act of 1974; those 
:ations are used to fund promising health programs. 

jiation thresholds for different city classes for the 50 states are given by the Bureau of 
ensus (1978a). 


paigns, and market research. Several private companies even specialize in 
further disaggregating the Census Bureau’s small-area estimates info 
estimates for census tracts within geopolitical boundaries. Small-area 
data are frequently used for developing estimates for areas other than the 
standard ones, such as market areas. As more businesses become aware of 
the existence and utility of small-area estimates, their use is expanding. 


1.1c POSSIBLE APPROACHES TO MEET NEEDS FOR POSTCENSAL 
ESTIMATES 

There are many conventions for generating postcensal estimates; they vary 
in both cost and accuracy. One convention might satisfy the needs of some 
users but not of others. This section describes seven possible conventions, 
ranging roughly from least to most expensive. This list is by no means ex- 
haustive but indicates that there are alternatives and that there is a rather 
wide trade-off between accuracy and cost. For the sake of simplicity these 
conventions apply to population only; the extension of these conventions 
to estimates of income would be straightforward. 

1. Use of decennial census counts^ The least costly convention accepts 
the decennial census counts for the following decade. Thus population 
counts would be updated only every 10 years. Quite clearly, the estimates 
would generally become less accurate over the decade since the last cen- 
sus. Yet this convention is currently used for establishing the number of 
U.S. Representatives to which each state is entitled. 

2. Use of decennial census counts with a rate of change equal to that of 
the nation or the state Convention 1 can be modified to account for esti- 
mated growth of the national population over the decade. The simplest 
way to allocate growth is to assume that every place grows at the same rale 
as the nation. Such a convention would clearly not differ in effect from 
convention 1 if a fixed pie were being shared on the basis of population, 
but some areas would cross thresholds. One could also prepare estimates 
of the change in population for each state (only 51 estimates) and assume 
that every place within a state grows at the state rate. This convention 
would at least allow some regional variation in the distribution of popula- 
tion over the decade. The added cost over convention 1 is minimal. 


•^This discussion ignores the fact that not all persons are counted in the census, 


jn oiriiisi anu ucains are avaiiaoie inrougn me viiai registranon 
1 , the census counts could be updated by the addition of births and 
ibtraction of deaths since the census date. Such a convention still 
migration, which is known to be a larger component of change than 
il increase for many local areas. The added cost over convention 1 is 


e of decennial census counts updated by natural increase and esti- 
of migration The addition of a migration component considerably 
[es the required sources of data needed to prepare estimates. The 
d States has no requirement that people report a change of address 
entral statistical authority. Hence migration must be estimated by 
se of symptomatic indicators such as school enrollment, housing 
_etc. or by address information contained in annual Internal 
ue Service (irs) returns. Even if a migration component is not 
ited directly, available methods for estimating population including 
nts require data on these symptomatic indicators. This convention is 
le now used by the Census Bureau. The added cost of the Bureau’s 
it procedures over convention 1 is perhaps $20 million per decade 
X updates over the decade (excluding the cost of updating the 
aphic coding guide). 


e of decennial census counts augmented by a mid-decade census If 
;us enumeration were held every 5 years, one objection to convention 
lid be softened. Perhaps accurate 5-year updates would provide in- 
ition on change that is sufficiently timely to meet the needs of many 
and the requirements of many uses of local area data. The addi- 
cost, however, is probably in excess of $600-$700 million. Of 
e, a mid-decade census would serve many other uses as well, so the 
should not be attributed entirely to the small-area population and in- 
estimates. Even if the mid-decade “census” did not attempt corn- 
enumeration but was a large-scale sample survey that provided ac- 
3 estimates for small areas, the cost would still be great — probably 
than $500 million. 


jographic coding guide is used by the Census Bureau to assign mailing addresses on 
1 Revenue Service individual income tax forms to places of residence. This procedure 
rtant both in making population estimates by the administrative records method and 
ing postcensal per capita income estimates. The cost of the most recent updating of 
ing guide in 1975 was roughly $9 million. (For further discussion, see section 1.2a; 
lix A, sections 2.9, 3.9, and 4. Id; and Appendix K.) 


local authorities when he or she moves, buch a system is capable, at least 
in theory, of providing population counts at any point in time. Population 
registers are maintained in the Netherlands, Finland, Belgium, Norway, 
Sweden, and Denmark and among certain Indian tribes in the United 
States. The Panel does not consider this to be a practicable or desirable 
alternative for the United States. 

l.ld CONSIDERATIONS OF ACCURACY 

Three factors are important in the production of postcensal estimates of 
population and income; accuracy, timeliness, and low cost. Timeliness 
refers to the availability of estimates within a short time after their 
reference dates. Low cost is usually thought of as a constraint rather than 
a goal, but actually each of the three goals constrains attempts to satisfy 
the other two. This section focuses on the accuracy of the estimates. 

An estimate is considered accurate if it is close to the value of the 
parameter (population or per capita income) it is estimating, which is 
typically unknown. A variety of measures of this closeness or accuracy can 
be defined. Ideally, an estimating procedure (or estimator) should meet 
four criteria: (1) low average error, (2) low average relative error, (3) few 
extreme relative errors, and (4) absence of bias for subgroups. “Error” is 
defined here as the difference between the estimate and parameter. “Rela- 
tive error” is the error expressed as a proportion or percent of the 
parameter. “Average error” (or “average relative error”) refers to the 
arithmetic mean of the errors (or relative errors) disregarding sign (i.e., 
plus or minus). Bias means that an estimating procedure produces 
estimates that tend to be too high or too low for certain classes of areas. 

As is often the case in statistics, it is generally not possible to produce a 
set of estimates that will minimize all of the above criteria simultaneously, 
so it is necessary to make choices. Minimization of criterion 2 requires 
that the relative error in the postcensal estimate be small for a place 
selected at random. Of course, the actual ^^gt^hude of the relative errors 
will depend on unpredictable circumstances, so at best the relative error 
for a place can be low with high probability, or the expectation of the 
relative error can be small. Although the desirability of low average 
relative error is obvious, this criterion may become controversial if one 
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small areas. 

Criterion 3, few extreme relative errors, means that the relative errors 
for all places should be approximately the same size. (As was noted above, 
if errors are random, “same size” means with high probability, in ex- 
pected value, or in an analogous sense.) Consider a procedure that pro- 
duces a set of 1,000 estimates with a mean relative error of 4 percent. If all 
the relative errors are close to 4 percent, the procedure may be very satis- 
factory, but if the worst 10 percent of cases have an average relative error 
of 20.2 percent while the best 90 percent of cases have an average relative 
error of 2.2 percent, the procedure may be very unsatisfactory. 

Criterion 4 recognizes that the presence of bias can create political ten- 
sions. Estimation procedures rest on demographic and economic assump- 
tions that may not apply to particular classes of areas. For example, as is 
discussed below (section 1.2a), the administrative records method, which 
uses information on tax returns to estimate migration rates, depends 
crucially on an assumption that the proportion of people filing tax returns 
is the same for migrants into an area, migrants out of an area, and those 
not moving to or from an area during the given time period. 

Criterion 1, low average error, tends to minimize the dollar amounts of 
misallocated funds under formula grant programs such as general revenue 
sharing (grs) because those allocations are often in practice approxi- 
mately proportional to the fraction of total population residing in an area. 
Since most of the population live in large areas, emphasis on this criterion 
implies choosing estimators largely according to their performance in pro- 
ducing good estimates for large areas. This criterion is thus in clear con- 
trast to criterion 2. 

In its reports, the Census Bureau indicates primary concern with 
criteria 2 and 3 (low average relative error and few extreme errors) and 
some attention to criterion 4 (bias) in its selection of alternative pro- 
cedures (for example, see Bureau of the Census, 1973b, pp. 2, 10). The 
Census Bureau is conducting research on the biases in the administrative 
records method caused by low income-tax filing rates for estimation of in- 
terstate migration (Bureau of the Census, 1978c). In evaluating the ac- 
curacy of postcensal estimates, the Panel chose to use the same general 
criteria as the Census Bureau. Thus we considered average relative error, 
extreme relative error, and bias. The Panel also believes that considera- 



threshold is involved, then a high level of accuracy may be of supreme con- 
cern. The fact that the population estimate for Trenton, New Jersey, 
dropped from 101,365 in 1975 to 99,672 in 1976—328 below the threshold 
of 100,000 required for prime sponsorship for ceta programs — illustrates 
the importance that a small error could have for a city government 
(Bureau of the Census, 1977c, 1979). Given the levels of accuracy inherent 
in the current estimation procedures, the difference of 328 could well have 
been entirely due to error. On the other hand, if a fixed amount of funds is 
to be carved up among geographic areas on the basis of population, then 
only differential error among geographic areas will create disparity be- 
tween the intent of the legislation and the reality of disbursement. Simi- 
larly, private users of such data may tolerate rather large errors, since, for 
example, the decision to locate a business in an area does not require a 
precise estimate of the rate at which an area is growing. 


1.2 CURRENT ESTIMATION METHODOLOGY 

Since the last census in 1970 the Census Bureau has published annual 
postcensal population estimates for states and counties. Postcensal esti- 
mates for subcounty units were first prepared for July 1, 1973, and have 
been prepared annually beginning with those for July 1, 1975. Per capita 
income estimates for states, counties, and subcounty units have been pro- 
duced every year or two since July 1, 1973.^ 

For estimates of postcensal population, the Census Bureau uses essen- 
tially two kinds of methods: component methods and regression methods. 
Component methods first calculate population change, using the number 
of births minus the number of deaths plus the net number of migrants: the 
postcensal estimates are the sum of the estimated population change since 
the last census and the reported population in the last census. In regres- 


^ These population and per capita income estimates for states, counties, and subcounty areas 
are published in the Census Bureau’s Current Population Reports Series P-25 (see Bureau of 
the Census, 1974, 1975b, 1979); provisional and revised county estimates appear in Current 
Population Reports Series P-26 (see Bureau of the Census, 1973b). 
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sion methods, equations are constructed to relate observed population 
changes to observed changes in other “symptomatic” data that are 
available and considered relevant. Subsequent observed (postcensal) 
changes in symptomatic data are then transformed by the equations to 
yield estimates of postcensal changes in population, which are applied to 
the reported population in the last census. 

Estimation of postcensal population change for subnational areas is at 
best a complicated process. Numerous data sources are used. Addresses 
on IRS individual income tax returns for different years are matched to 
estimate internal migration. Immigration and Naturalization Service 
records together with passenger statistics (relating to numbers of persons 
entering and leaving Puerto Rico) form the basis for estimating net im- 
migration from abroad. Data on births and deaths are obtained either 
from state departments of health or from the National Center for Health 
Statistics. For many kinds of data the Census Bureau relies on its contacts 
in the Federal-State Cooperative Program for Local Population Estimates 
(fscp). For example, the fscp members provide to the Census Bureau 
data on births and deaths from state departments of health; data on 
populations in institutions and military barracks; school enrollments by 
county (used in one component method); and administrative data of dif- 
ferent kinds, such as numbers of drivers licenses issued, size of the labor 
force, and numbers of new building permits issued (all used in regression 
methods of estimation). 

For estimates of postcensal per capita income the Census Bureau uses a 
component method. Income change is viewed as the total of the following: 
change in wage and salary income, change in social security income, and 
changes in various other kinds of income. The estimates of changes in in- 
come draw upon data from two sources: Bureau of Economic Analysis 
estimates of components of income for state and counties and ms in- 
dividual income tax returns. The Bureau of Economic Analysis uses ad- 
ministrative data from hundreds of sources to make their estimates of 
components of income (see Coleman, 1978). 

For both population and income, errors in the estimates of change can 
arise both from inappropriateness of assumptions underlying the methods 
and from errors in the data used. In addition, errors in postcensal 
estimates of level (rather than of change) can arise from errors in the base- 
year census data to which the estimates of change are applied. Under- 
count is a significant source of error in the census counts of population. 
The Census Bureau estimates that the 1970 census failed to count 5.3 
million people, or 2.5 percent of the total population (Bureau of the Cen- 
sus, 1975a). The estimated rates of net undercount vary widely for dif- 
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come (Bureau of the Census, 1977b; Ono, 1972). 

The postcensal estimation methods use current data, and so the esti- 
mates appear after their reference dates. The length of delay varies from 
year to year and by the level of geography of the jurisdiction being esti- 
mated. Because so many data sources are used, a delay in arrival of any 
one set of data can hold up production of the estimates. Several stages of 
estimates, corresponding to different delays, are published; earliest are 
“provisional,” for counties there are “preliminary,” and latest are “re- 
vised” or “final.” For states, the provisional population estimates appear 
8-17 months after the reference date, and revised estimates follow about a 
year later. For counties, the delays in the population estimates are typi- 
cally 9-15 months for the provisional, 21 months for the preliminary, and 
21-27 months for the revised. The preliminary county estimates are used 
for determining general revenue sharing allocations. For subcounty areas, 
only one set of estimates is usually published, roughly 21 months after 
the reference date. The delays in publication of provisional per capita in- 
come estimates for states, counties, and subcounty units are approxi- 
mately the same as the delays for the subcounty population estimates. The 
revised per capita income estimates follow about a year later. ^ 

The time references for the data do not always correspond to those for 
the estimates. While the target date for the estimates is July 1, the school 
enrollment data used to estimate migration pertain to the preceding Sep- 
tember or October, and the ms addresses (also used to estimate migration) 
pertain to varying dates between the preceding January 1 and April 15. For 
states and counties, calender year birth and death data are interpolated 6 

*The differences between provisional and revised population estimates are discussed in some 
detail by the Bureau of the Census (1974, p. 14) for states and in Appendix A for counties. 
The subcounty population estimates are not generally revised (except for the 1973 estimates, 
which were revised because of changes in the geographic coding procedures of the Census 
Bureau). Revisions in per capita income estimates result from changes in data rather than 
changes in procedure. 


1.2a POPULATION ESTIMATION 

This section summarizes the methodology used by the Census Bureau to 
prepare its postcensal population estimates (see Appendix A below for 
further details). We should note at the beginning that the population of an 
area is the number of persons whose place of usual residence is in the area; 
it includes both legal residents and those not legally permitted to reside in 
the United States. 

The estimates for different geographic levels are produced in a hierar- 
chical manner. National estimates are produced first. Then state esti- 
mates are produced and “controlled” to the national estimate: that is, the 
state estimates are scaled to sum to the previously derived national esti- 
mate. County and subcounty estimates are controlled to state and county 
totals, respectively. 

The Census Bureau uses several methods to produce postcensal popula- 
tion estimates. To estimate total U.S. population, a component method is 
used to account for births, deaths, and net immigration. State and county 
population estimates are derived as averages of the results of three pro- 
cedures; a component method, component method II (cm ii); an adminis- 
trative records method (ar); and a ratio-correlation method (rc). Gener- 
ally, subcounty estimates are derived from the ar method alone. 

The CM II and ar are component methods that analyze population 
change by estimating the demographic facts of birth, death, and migra- 
tion. Ideally, information about components of population change could 
be recorded from time to time as events of birth, death, and changes of 
residence occur. Updating the population level of an area would then be a 
simple matter of adding to the population at some initial time the com- 
ponents of population change during the period up to the reference date of 
interest. Such an ideal situation is far from the case for the United States. 
People changing their place of residence are not required to report to a 
central agency. Births and deaths are registered individually by place of 
occurrence (rather than by place of residence); the aggregate statistics are 
tabulated by place of residence for all counties and for all subcounty 
jurisdictions with (1970) population of more than 10,000 but not generally 
for subcounty jurisdictions with population of less than 10,000. 

Less information is available on internal migration than on births and 



That is, AR matches individual tax returns for successive periods and 
determines for each area the numbers of inmigrants, outmigrants, and 
nonmigrants represented by the returns (taxpayers and their dependents). 
From the difference between the inmigration and outmigration rates of 
taxpayers and dependents, a net migration rate is calculated and applied 
to a base population figure, yielding an estimate of net internal migration. 
An important part of this process is determining to which of the 39,000 
geographic areas of residence the tax returns should be assigned. The 
mailing address is often insufficient for determining place of residence, 
and questions on residence were asked on the 1972 and 1975 tax returns. 
The information from these questions is used to construct geographic 
coding guides to assign mailing addresses to places of residence. (For fur- 
ther discussion, see Appendix A, section 4. Id.) 

The AR method estimates immigration and emigration separately from 
internal migration. Although alien immigration is legally controlled by the 
Immigration and Naturalization Service, the number of aliens who enter 
and reside in the country without a legal status has been a statistical as 
well as administrative problem. Finally, emigration of many U.S. 
residents to other countries may never be reflected in aggregated 
statistical or administrative records. 

Various categories of people are treated differently in component 
methods. People living in group quarters, such as college students, people 
in institutions, and people in military barracks, are treated separately 
because these special populations are obviously not subject to the same 
“risk” of birth, death, or migration as the rest of the population. In addi- 
tion, whenever appropriate and feasible, estimates of changes in birth, 
death, and migration are also differentiated by age, sex, and race. For ex- 
ample, at state and county levels, the elderly population of age 65 and 
above are treated as a special population, and changes in the number of 
elderly people are estimated on the basis of Medicare data. 

At the subcounty level, there are complications involved in estimating 
births and deaths because data on births and deaths are generally not 
available for subcounty places with less than 10,000 population 
(representing more than 90 percent of subcounty units); estimation of 
these components of population change must be indirect (see Appendix A, 
section 4.1b for details). 
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The ratio-correlation method (rc) is a regression method rather than a 
component method. Regression methods are based on the fitting of a rela- 
tionship (usually by least-squares regression) between the population 
change of an area and changes in symptomatic variables. The relation- 
ship, or model, is fitted on the basis of information available for the two 
preceding decennial censuses. The relationship is then used to generate 
postcensal population estimates when current data are substituted for the 
symptomatic variables (see Appendix A, section 2.5 for details). 

For state population estimates, some of the symptomatic variables used 
are the number of students enrolled in elementary schools, of federal in- 
come tax returns, of registered passenger cars, and of people in the work 
force. At the county level, other variables are included in the equation if 
the data are available for all counties in the state. Another difference in 
the application of rc to estimates of the state and county population in- 
volves people living in group quarters. At the state level, rc is used only 
for estimates of non-group quarters population under age 65, while the 
rest of the state population is estimated as in cm ii. At the county level, rc 
is used to estimate the whole non -group quarters population. 

Occasionally, estimates produced by other methods are included in the 
Census Bureau’s average estimate. For example, a drivers license address 
change method (dlac) is used by California to estimate county popula- 
tions. DLAC is a component method that uses drivers license address 
changes for estimation of net migration. In Florida a housing unit method 
(hum) was used for county estimates in 1975. hum estimates the non- 
group quarters population by the product of the estimated average 
number of persons per household and the estimated number of occupied 
housing units. These estimates usually are produced not by the Census 
Bureau but by participants in the Federal-State Cooperative Program for 
Local Population Estimates (see section 1.2d). These estimates are more 
often available for counties than for subcounty areas, but some state agen- 
cies also prepare subcounty estimates. Since the Census Bureau requires 
that estimates within a state be the product of a uniform methodology, the 
estimates from these other methods are taken into account only if they are 
provided for all counties or subcounty areas within a state. 

Special censuses for county and subcounty jurisdictions may be under- 
taken by the Census Bureau on the authorization of the appropriate local 
government.^ The local government pays the necessary expenses and pro- 

^The Census Bureau was also required by the Voting Rights Act of 1965 (42 U.S.C. § 1973 
aa-5, as amended by P.L. 94-73) to conduct special censuses for jurisdictions meeting certain 
criteria in order to determine whether more than 50 percent of the nonwhite persons in the 
jurisdiction were registered to vote. In the vast majority of cases, however, a special census is 
taken by the Census Bureau only if a local government requests it. 



Oregon, Washington, and California, special censuses are conducted 
predominantly by state agencies. 

To combine different estimates, the Bureau first controls to higher level 
totals (e.g., all county estimates must sum to the state estimate) and then 
averages the different estimates, assigning equal weights to each. When 
the results of a special census are available for a county or subcounty area, 
they are used instead of the various postcensal estimates. In those situa- 
tions the adjustment of county (subcounty) estimates to sum to the state 
(county) estimate follows a complicated procedure, sometimes called 
“rake/float” (see Appendix A, section 4.2 for details). 

State population estimates, whether provisional or revised, are derived 
as equally weighted averages of the estimates from the component method 
II, the ratio-correlation method, and the administrative records method. 
The methods used to produce county population estimates vary, depend- 
ing on whether the estimates are provisional, preliminary, or revised. 
Generally (for exceptions and more details, see Appendix A, section 3.1), 
revised county estimates are derived as equally weighted averages of the 
estimates from cm n, ar, and rc. Preliminary county estimates (used for 
general revenue sharing) are generally obtained as the sum of the previous 
year’s revised estimate plus the average of two estimates of change during 
the year, one derived from cm ii and the other from ar.® Provisional 
county estimates are obtained as the sum of the previous year’s revised 

®The weighting for year t can be represented as follows (for simplicity we ignore the inclusion 
of locally prepared estimates): 

preliminary estimate (r) = revised estimate (t — 1) 

-1- '/2[CM u(t) — CM u(f — 1)1 
-h '/2 [ar(0 - ar(/ - 1)1 
= ‘/3RC{t — 1) 

— VbCM Il(/ — 1) 

— '/6AR(t — 1) 

-t- ’/2CM U(t) 

+ V2AR(/). 

To derive the second equality, note that revised estimate (r — 1) = V 3 [Rc(r — 1) -f cm n(t — 
1) -1- AR(t ~ 1)]. The RC estimates for year t are not used for deriving the provisional 
estimate for year t because they are not available at the time the provisional estimates are 
produced: they are used for the revised estimates. 
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estimate plus an estimate of change during the year, where the change is 
estimated either by cm ii alone or by the average of cm ii and hum. 

Table 1.2 summarizes the application of the methods for providing dif- 
ferent estimates. 

1,2b PER CAPITA INCOME ESTIMATION 

The Census Bureau’s definition of per capita income (pci) is the average 
amount of income received per person during the preceding calendar year 
by all persons residing within a defined political jurisdiction as of the 
estimate date. (The methodology is summarized in Appendix B.) The per 
capita income estimates are based on the concept of money income. The 
Bureau of the Census defines total money income as the sum of (1) wages 
and salary income, (2) net farm self-employment income, (3) net nonfarm 
self-employment income, (4) social security and railroad retirement in- 
come, (5) public assistance income, and (6) all “other” sources of money 
income including interest, dividends, pensions, unemployment insurance, 
alimony, veterans’ payments, etc. The total money income represents in- 
come received prior to personal income tax, union dues, or any other 
deductions. 

The PCI estimates for different geographic levels are, like population 
estimates, produced in a hierarchical manner. State estimates are pro- 
duced, then county estimates, and last, subcounty estimates. County (and 
subcounty) estimates are controlled to the state (and county) estimates in 
several ways. For example, the estimates of wages and salary income for 


TABLE 1.2 Methods Used by the Census Bureau for Making Substate 
Population Estimates 


Method 

County 



Subcounty 

Provisional 

Preliminary 

Revised 

CM II 

X 

X 

X 


AR 


X 

X 

X 

RC 



X 



Note: When more than one method is listed, the estimates are averaged. The state and 
county provisional and revised estimates are derived by adding to a previous revised estimate 
(or census count) the change calculated by the method or average of methods used. For coun- 
ties and subcounty areas in some states, additional methods are used by state agencies par- 
ticipating in the fscp, and the resulting estimates are averaged by the Census Bureau with 
the Bureau’s estimate(s). 



returns, ana rne remaining rive componenis or money income are upuaicu 
on the basis of bea estimates of personal income. 

Personal income and total money income are different concepts of in- 
come, and the bea data must be adjusted. For example, the bea data 
refer to income where produced (place of work) rather than income where 
received. Adjustments are performed to convert the bea data to a place of 
residence basis, as used by the money income concept. These adjustments 
can be substantial for areas where many workers commute. Also, the bea 
data include estimates of in-kind income, such as imputed rents and food 
produced for home consumption. In-kind income is not a component of 
money income and must be excluded from the bea data before it can be 
used to update money income. Other adjustments are also made in the 
bea data to attain compatibility with the money income concept. (See Ap- 
pendix B for further discussion of the role of bea’s personal income 
estimates.) 

County PCI updates are developed in generally the same manner, except 
that the Census Bureau updates county wages and salary income intact as 
a per capita figure, on the basis of changes in ms data on gross income per 
exemption on the individual income tax returns. Another difference be- 
tween the methodology for county pci and state pci centers on the estima- 
tion of farm self-employment income. Farm income is notoriously volatile, 
capable of sharp year-to-year changes, which may be understated or 
overstated by the data used to estimate them. To prevent unwarranted 
sharp fluctuations in its estimates of county farm income, the Census 
Bureau uses two farm income estimates and constrains the rates of 
changes in these estimates. (See Appendix B for details.) 

Subcounty pci is estimated roughly the same way as county pci. Special 
considerations are necessary because bea data are not available for sub- 
county areas. To update subcounty pci, the Census Bureau decomposes 
money income into two parts: transfer income (xi) and adjusted gross in- 
come (agi). The Ti is composed of social security income, public 


assistance income, and some parts of “other” income, such as unemploy- 
ment compensation and veterans’ payments. The agi is the rest of money 
income. The Bureau estimates ti by assuming that the rate of change in 
subcounty ti is the same as the rate of change in county ti. Change in agi 
is estimated from the income reported on income tax returns. The rates of 
change are applied to base period estimates to yield estimates of the level 
of postcensal per capita income. 

Because the postcensal pci estimates are obtained by applying rates of 
postcensal change to base period estimates, weaknesses in the 1970 census 
estimates affect the postcensal estimates. The 1970 census estimates of pci 
(calendar year 1969) were based on 20-percent samples, and so the pci 
estimates for the smallest places are subject to large sampling variance. 
Hence the Census Bureau did not attempt to estimate directly the 1972 pci 
for places with 1970 population under 500 but used the county pci 
estimate for these places. Using recently developed statistical techniques 
(Fay and Herriot (1979); see also Appendix J), the Bureau was subse- 
quently able to revise its estimates of 1969 pci and produce pci estimates 
for those small places. 

Numerous other substitutions, constraints, and edits to the data are 
used to adjust for weaknesses in the data. For example, to compensate for 
conceptual differences between bea and Census Bureau income concepts, 
the county farm income estimates are constrained to fall within 80-120 
percent of an alternative estimate.*^ This constraint affects about one- 
quarter to one-third of the counties. Many of the substitutions, edits, and 
constraints for the subcounty data are designed to protect against errors in 
attributing ms tax returns to the wrong geographic area. The problem of 
assigning the correct geographic area of residence for the filer of a tax 
return is significant for the per capita income estimates as well as for the 
population estimates. Other constraints restrict estimates of relative 
change for subcounty units to be close to the relative change for the county 
as a whole. These constraints damp changes but yield more plausible and 
presumably more reliable subcounty estimates. Complicated controls are 
also employed to force subcounty estimates for classes to sum to county 
totals. (For further discussion, see Appendix B.) 


1.2c REVIEW OF THE ESTIMATES: CHALLENGE PROCEDURES 

An important part of the estimation program is the process of local 
review. Before the population estimates are published, the Census Bureau 

^This alternative estimate is the “gross change” farm income estimate: see the section on 
county updates in Appendix B. 



be changed. The Bureau keeps a log of all these challenges and subject 
each to a detailed review. This review includes examination of data prc 
vided by the locality in support of its challenge and also a second carefi 
check of the data used by the Census Bureau to derive its estimate. I 
some cases the Bureau revises its estimate. More often the loc^ 
authorities do not provide sufficient data to support their challenges, an 
the Census Bureau declines to revise its estimate. In the latter case, infoi 
mal discussions take place between officials of the local area and of th 
Census Bureau to try to resolve the challenge. If these informal discus 
sions fail, a state or unit of local government may request a formal heai 
ing. 

The local review process for per capita income estimates is slightly df 
ferent. These estimates are sent for review not to each local area but rathe 
to members of the Bureau of Economic Analysis’s “user group.” Th 
group comprises several people from each state who review the bea pel 
sonal income estimates for counties and the census per capita incom 
estimates for counties and subcounty areas in their respective state. The 
forward comments on the estimates directly to the Census Bureau. Th 
local officials themselves have an opportunity to review their estimate 
when the Office of Revenue Sharing gives them advance notice about th 
data elements on which their grs allocations will be based. At this poir 
the local areas may challenge the Office of Revenue Sharing or the Censi 
Bureau. In either case, the Census Bureau will review its estimate an 
possibly revise it. Local areas usually have few data with which to suppo 
their challenges. In scrutinizing the derivation of the estimate, the Burea 
may nevertheless discover an anomaly and revise its per capita incorr 
estimate. If the Bureau fails to revise the per capita income estimate to tf 
local government’s satisfaction, that government may request a form; 
hearing. 

The Bureau has only recently established the procedure for a form 
hearing. The major provisions of the procedure (1) require that an info 
mal challenge be filed within 180 days after the release of the estimate 
(2) require that informal review be completed before a formal hearing 
allowed, (3) provide for the appointment of a hearing officer (employed 1 

'°A set of rules for the hearings appears in Federal Register (1979, pp. 20,646-20,649). 
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he Census Bureau but not involved in the preparation of the estimates) to 
•eceive evidence under oath, (4) allow for the cross-examination of both 
Darties in the proceedings and of any witnesses, and (5) set time limits for 
he initiation and completion of the formal challenge proceedings. 

In the short period since the provisions for a formal hearing were 
jstablished, none has been requested. Neither have there been any 
challenges in court to the Census Bureau’s postcensal estimates of popula- 
ion and income, despite the fact that the complaints and informal 
challenges to the Bureau’s estimates are numerous — roughly 50-100 per 
^ear for the income estimates and several thousand per year for the 
copulation estimates. 

l.2d FEDERAL-STATE COOPERATIVE PROGRAM 

rhe Census Bureau initiated the Federal-State Cooperative Program for 
Local Population Estimates (fscp) in 1967. The basic goal of the fscp was 
;o provide high-quality, consistent series of county population estimates 
vith comparability from area to area. The participants in the fscp are of- 
'icially designated state agencies. 

The fscp plays several roles in the Census Bureau’s postcensal estima- 
:ion program. As was mentioned earlier, the fscp contacts provide to the 
Census Bureau many kinds of data used to make the postcensal popula- 
:ion estimates. The state agencies also provide review and comment on the 
Census Bureau’s preliminary county estimates. This working relationship 
s beneficial to the Census Bureau because the fscp members have easier 
iccess to these data and are in a better position to evaluate the data and 
:orrect some kinds of errors. The fscp members are also better situated to 
discover new or additional data series that can be used in producing 
population estimates. The state agencies in the fscp may also produce 
population estimates that the Census Bureau uses in making its own 
estimates. 

The 49 states now participating in the program (all but Massachusetts) 
lave designated state agencies to deal with the Census Bureau. While 
2 arly efforts were limited to estimates for counties, several members of the 
pscp now produce subcounty estimates as well. When the Census Bureau 
ases the fscp estimates, it first controls them to totals and then averages 
;hem with the Bureau’s own estimates. 

At present, the fscp operates with very modest resources. The Census 
Bureau has put considerable energy of skilled professionals into method- 
Dlogical research, experimentation, and evaluations and into technical 
guidance for the states, but unlike other federal-state cooperative pro- 
grams (such as the emnlovment. hours, and earniners svstem of the Bureau 


1.3 FINDINGS AND CONCLUSIONS 


1.3a SUMMARY 

The Panel finds that the methodology of the three population estimation 
procedures used by the Census Bureau is generally sound.'' The Panel 
also commends the Bureau for attempting to measure the error of its 
estimates and for publishing the results. 

Despite the basic soundness of the estimation methods, however, they 
result in estimates that are directionally biased for some categories of local 
areas. They also result in large random errors for other areas, especially 
small subcounty areas (those with less than 2,500 population) and sub- 
county areas of moderate size (those with up to 25,000 population) 
undergoing rapid growth or decline in population. For example, for sub- 
county areas for which special censuses were taken in 1975, the average er- 
ror in estimates of total population was 23 percent for areas with less than 
500 population and 10 percent for areas with 1,000-2,499 population. 
(More than one-third of the subcounty areas eligible for grs funds had 
less than 500 population in 1975.) The average error for areas of very 
rapid population growth — defined here as an increase in population of 50 
percent or more between 1970 and 1975 — varied from 27 percent for those 
with less than 500 population to 19 percent for places with 10,000-24,999 
population and then dropped sharply to 7 percent for places with 25,000 
or more population (see Table 2.8). 

The Panel has several proposals for technical modifications of the 
estimation procedures, which may improve the accuracy of the estimates 
to some degree, especially for counties and large cities. However, the 

' ' For county estimates the Bureau uses unweighted averages of three methods; component 
method II, ratio-correlation, and administrative records. For subcounty estimates, available 
data permit the use of only the administrative records method. 
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Panel knows of no feasible procedure within the limits of present data 
sources that would significantly reduce the errors in population estimates 
for small subcounty areas. Accurate estimates for small areas cannot be 
developed unless data collection is increased enormously, by such means 
as more frequent censuses or a compulsory registration system. ^2 

The task of estimating per capita income for small areas is even more 
formidable than that of estimating population. Because of severe data 
problems, the estimates of postcensal per capita income are less accurate 
than those for population. The Panel does not have any recommendations 
for improving the methodology for estimates of per capita income, nor do 
we know of alternative data sources that might produce substantially more 
accurate estimates at acceptable cost. 

In our opinion the Census Bureau is in an unnecessarily difficult situa- 
tion. It is required to defend (to the last digit) population estimates that its 
own analyses have shown may have relative errors of 25 percent or more. 
The mushrooming amount of legislation that authorizes distribution of 
funds on the basis of population or income estimates for small areas gives 
increased incentive for officials from these areas to challenge the Bureau’s 
figures, and an increasing share of the Bureau’s energies must be devoted 
to these challenges. 

In evaluating the Census Bureau’s program for postcensal estimates, 
the Panel assessed the accuracy of the estimates and examined the logic of 
the methodology used to produce the estimates. In addition, the Panel 
tried to identify some of the key decisions made when the statistical 
methodology was developed. 

The available information indicates that the postcensal population 
estimates are most accurate for areas with large populations and moderate 
rates of population growth or decline. The relative error*^ of the estimates 
increases as the population size of the area decreases and also as the per- 
cent change in population (growth or decline) increases. In general, the 
estimates for counties are quite accurate: the average relative error was 
3.9 percent for 133 counties in which special censuses were taken from 
1974 to 1976. The population estimates for subcounty areas with small 
populations were highly inaccurate: the average relative error was 23 per- 
cent for subcounty areas with less than 500 population and 10 percent for 

'^The recent report of the National Commission on Employment and Unemployment 
Statistics (1979) arrived at a similar finding about labor force statistics: that there is no way, 
at reasonable cost, to produce accurate employment and unemployment statistics on a cur- 
rent basis for thousands of local areas. 

'^The measure of average relative error used was the arithmetic mean of the percent dif- 
ferences (disregarding sign) between the population estimate and the special census count, 
generally referred to as the “average percent difference.” 



Estimation of postcensal per capita money income is an especially dif- 
ficult task. As was noted above, because of severe data problems the post- 
censal estimates of per capita income are less accurate than those of 
population. The limited evidence available indicates that accurate income 
estimates cannot be produced even for subcounty areas with populations 
from 10,000 to 20,000. No evaluation data were available for county 
estimates. (See section 1.3d for more discussion and section 2.3 for 
details.) 

The methodology of the Census Bureau’s per capita income estimates is 
well designed, but problems exist because of data limitations and because 
of the conceptual basis for the estimates (see section S.le). The estimation 
procedure draws heavily on the county personal income estimates of the 
Bureau of Economic Analysis. Personal income and money income have 
different conceptual bases. Hence complicated adjustments of question- 
able accuracy must be applied to the personal income data; the problems 
are particularly severe for areas in which farm income is a substantial part 
of total income. These areas include many of the smaller subcounty areas 
and counties. 

1.3b POPULATION ESTIMATES 

In evaluating the quality of the estimates of population the Panel has ex- 
amined both the logic and the accuracy of the techniques used to produce 
the estimates. In examining the logic we considered whether the pro- 
cedures made sense from the standpoint of demographic and statistical 
theory. To study the accuracy of the estimates, we relied primarily on 
comparisons of postcensal estimates with the results of special censuses 
taken during the 1970s. 

Special censuses are censuses conducted for municipalities, townships, 
or counties within a state that are not part of a national effort. Places that 
have special censuses are usually self-selected; they are not a random sam- 
ple of all places. They choose to have a census, and they pay for it. Areas 
are more likely to have special censuses if they expect a special census to 
document a substantial increase in population. Such places tend to have 
higher-than- average growth rates, but it is known that postcensal popula- 
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tion estimation methods perform worse for those places than for slowly 
growing places. A way to avoid the bias that exists in the selection of 
places for special censuses would be for the Bureau to underwrite the cost 
of special censuses for a probability sample of local areas as it did in 1973 
for 86 areas. Such a sample would be the strongest way to test the ac- 
curacy of the estimates, but it would be prohibitively expensive to do for a 
sufficiently large number of areas to provide reliable estimates of error for 
the full range of population-size and rate-of-growth subgroups of areas. 


1.3b(l) County Estimates 

Estimates of county population are, on the average, quite accurate. For 
133 counties receiving special censuses between January 1, 1974, and 
December 31, 1976, the average difference (disregarding sign) between 
the postcensal estimates and adjusted special census counts was 3.9 per- 
cent. The accuracy of the estimates varied with the population size of the 
county, and with the percent change in population size, as follows (data 
are from Table 2.1). 




1970 Population 






Under 

1,000- 

5,000- 

25,000- 

100,000 


Total 

1,000 

4,999 

24,999 

99,999 

or More 

Average percent difference 

3.9 

7.1 

5.2 

3.6 

2.9 

1.4 

Number of counties 

133 

24 

23 

32 

22 

32 



Percent Change in 

Population, April 1, 1970, to 



July 1, Year of Special Census 





-5.0 

— 0.0 to 

•+-0.0 to 

-1-5.0 to 

-f 15.0 to 

-1-25.0 


or More 

-4.9 

-1-4.9 

+-14.9 

-1-24.9 

or More 

Average percent difference 

5.1 


3.3 

2.4 

3.6 

6.8 

Number of counties 

11 

16 

25 

42 

17 

22 


The populations of large counties are estimated more accurately than 
those of small counties; those of slowly growing counties are estimated 
more accurately than those of rapidly growing or declining counties. For 


'■^The percentage base is the special census adjusted (by linear interpolation or extrapola- 
tion) to refer to the nearest July 1, which is the date for the postcensal estimate. 




There also is evidence oi bias m the county estimates: they tend to 
underestimate the change in population since the last census, both when 
population is increasing and when it is declining. For example, the 
estimates were too high for 8 of 11 counties that declined in population by 
5 percent or more, while estimates were too low for 32 of 39 counties that 
had grown by 15 percent or more (see Table 2.2). 


1.3b(2) Subcounty Estimates 

Estimates of population for subcounty areas are less accurate than 
estimates for counties of the same size. For example, counties with 1,000 
to 4,999 population had an average error of 5.2 percent; subcounty areas 
of the same size had an average error of 8.8 percent (see Tables 2.1 and 
2.9). 

The population estimates of subcounty areas in 1975 were quite ac- 
curate for areas with large populations but were increasingly inaccurate as 
population size decreased. For example, the average percent difference 
between 1975 population estimates and comparable 1975 special census 
counts was only 2.6 to 2.7 percent for areas with population of 25,000 or 
more in 1970 but increased to more than 25 percent for areas that had less 
than 250 population in 1970 (see Table 2.7). 

The accuracy of the estimates also varied greatly by the rate of popula- 
tion change between 1970 and 1975. Areas with relatively stable popula- 
tions — less than 5 percent growth or decline — had an average error of 6 
percent; areas that grew by 50 percent or more or that declined by at least 
10 percent had average errors of more than 20 percent. 

Estimates for areas that were both small and had experienced rapid 
growth or decline were most inaccurate. For example, for subcounty areas 
with less than 500 population that declined in population by 10 percent or 
more between 1970 and 1975 the average error was 43 percent; for areas of 
the same size that grew by 50 percent or more the average error was 28 
percent (see Table 2.8). 

In general, very small areas (those with less than 500 population) had 

'-^This evaluation of the accuracy of postcensal estimates of population for subcounty areas 
was based on comparisons of 1975 population estimates with special census counts for 799 
subcounty areas that were taken in 1975. 
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large errors regardless of the rate of change in population. Only large 
areas (25,000 or more population) had relatively accurate estimates 
regardless of the rate of change in population; average percent differences 
in this population-size group varied from 2.4 percent for areas that 
changed by less than 10 percent (growth or decline) to 6.6 percent for 
those that grew by 50 percent or more. 

There is also strong evidence of bias in the subcounty population 
estimation methods: they consistently tended to underestimate the 
population of growing areas and to overestimate the population of declin- 
ing areas. In our comparisons, for example, more than 84 percent of the 
estimates for areas that declined in population by 10 percent or more be- 
tween 1970 and 1975 were overestimates, and more than 91 percent of the 
estimates for areas that grew by 50 percent or more were underestimates. 

The low levels of accuracy of the estimates for small areas, and for areas 
undergoing rapid growth or decline, are also evident in measures of “ex- 
treme error.” Over half of the subcounty areas with less than 500 popula- 
tion had relative errors of 15 percent or more. 

The measures of error discussed thus far are relative errors in the 
estimates of total population of subcounty areas. But the estimation 
methods are designed to measure change in population since the last cen- 
sus (since the total population estimates are obtained by adding the 
estimated change to the previous census counts). Moreover, the usefulness 
of the estimates as updates for the purpose of allocating general revenue 
sharing funds between regular censuses depends on the accuracy of the 
estimated changes in population. Hence the Panel also calculated 
measures of the relative error in the estimates of change in population 
since the last census. 

The errors based on change in population were, for the most part, many 
times larger than comparable errors in the estimates of total population, 
and the pattern of error was substantially altered. Subcounty areas subject 
to little growth or decline had the largest relative errors based on change 
in population, whereas the fast-growing areas had much smaller errors. 
From this perspective the greater accuracy of estimates of total population 
for slowly or moderately changing areas as compared with areas of rapid 
growth or decline can be explained by the fact that their change in popula- 
tion from 1970 to 1975 was a smaller proportion of their total population 
in 1975 than was the case for areas undergoing more rapid growth or 
decline. 

'^Relative errors based on change in population were calculated as averages of the percent 
differences between estimated change in population (1975 estimate minus 1970 census 
count) and enumerated change in population (1975 adju.sted census count minus 1970 census 
count); see section 2.2c for further discussion. 
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ology the Bureau needed methods that could be used to produce estimates 
for all the 36,000 subcounty jurisdictions participating in general revenue 
sharing. The input data (ms individual income tax records) for the ar 
method are available for all subcounty units, but data needed for the other 
methods used to produce county estimates are not available for all grs 
jurisdictions below the county level. 

The AR method was developed after 1970. The first tests of the method 
were performed for 16 counties and 8 subcounty areas with populations of 
more than 50,000 (Bureau of the Census, 1975b; Zitter, 1972; Zitter and 
Word, 1973). Later the ar estimates were tested against special censuses 
taken in 1973. These tests (Bureau of the Census, 1975b, Tables D-G) in- 
cluded comparison of the ar estimates with results of special censuses for 
a probability sample of 86 areas with population of less than 20,000 and 
for 165 areas where special censuses were conducted by the Census Bureau 
at the request and expense of the locality (these were not a random sample 
of subcounty areas). The Bureau’s decision to use the ar method to make 
subcounty population estimates was based partly- on these limited tests 
and partly on a priori considerations relating to the extensive coverage of 
the ms data and the lack of workable alternatives (Bureau of the Census, 
1975b). While more testing would have been desirable, it is the view of the 
Panel that the Census Bureau did as much as might reasonably be ex- 
pected, given the pressures of time after the general revenue sharing 
legislation was drafted. The Panel believes, however, that more testing is 
called for when decisions about the choice of future estimation methods 
must be made. 

For county population estimates, several methods are available. Prior to 
the 1970s the Census Bureau had traditionally relied primarily on four 
types of estimates: ratio-correlation, component method II, composite, 
and vital rates. Tests for 2,586 county estimates against the 1970 census 
(Bureau of the Census, 1973b, Table C) indicated that the Bureau’s rc 
method was clearly superior to the other three methods but that there were 
circumstances in which judicious averaging of cm ii or composite method 
estimates with rc estimates produced results that were better than those 
obtained with rc alone (Bureau of the Census, 1973b, Table D). 

On the basis of these tests the Bureau decided to use its ratio- 
correlation method (rc) and component method II (cm ii) in its county 
estimation methodology. The Bureau dropped the composite method for 
use in the 1970s, despite its good test performance. The reasons for drop- 
ping the composite method are not reported in publications, but Bureau 
staff have indicated that it was done for the same reasons that births were 
dropped as a predictor variable in the rc estimation of state populations 
(see Bureau of the Census, 1974), namely, because of changes in laws per- 
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Census Bureau. These tests indicate a large difference between 1973 post- 
censal estimates of pci and pci measures from special censuses taken in 
1973. The average difference (without regard to sign) for all places with 
population of 1,000 to 20,000 was 10 percent of the special census pci. For 
places with a 1970 population of less than 500 the average difference was 
28 percent of the special census pci. For places with populations between 
500 and 999 the average difference was 17 percent (see Table 2.13, column 
4). After revisions to their methodology for places with population of less 
than 1,000, the Census Bureau reduced the estimated levels of error for 
these small places by a few percentage points (see section 2.3). 

The available data are not adequate to draw any conclusions about how 
the error levels decrease as population size of the areas increases. Theoret- 
ically, the Census Bureau should be able to make more accurate pci 
estimates for larger areas than for smaller ones. The base figure with 
which the Bureau must work is more accurate for larger areas, and subse- 
quent adjustments should be more accurate. Adjustments to the 1970 cen- 
sus data are easier for metropolitan areas because wages and salaries are 
such a large component of income and current estimates of these are 
available. Estimating wages and salaries is a relatively simple task com- 
pared to estimating proprietors’ incomes, that is, the net business earn- 
ings of owners of unincorporated enterprises. 

As was noted above, for purposes of allocating general revenue sharing 
funds it is changes in population and per capita income since the last cen- 
sus that produce changes in the allocation of grs funds on postcensal 
dates. Hence it is the accuracy of the estimates of change in per capita in- 
come since the last census that is important in evaluating the use of the in- 
come estimates for updating allocations of grs funds. The relative errors 
in estimates of postcensal change in per capita income are much larger 
than those for estimates of total per capita income, for the same reasons 
cited in our discussion of population estimates for subcounty areas (see 
sections 1.3b(2) and 2.2c). 

There are many problems associated with measuring income trends in 
small areas. Agriculture or farm income is a more important component of 
income in areas with smaller populations than in areas with larger popula- 
tions. In addition, the measurement of entrepreneurial income has concep- 
tual problems not associated with other income figures (see section 5.1e). 

Another problem centers on the volatility of agriculture. It is difficult to 
find a typical year. The Bureau utilizes constraints on estimates of county 
farm income to damp sharp year-to-year changes, but these constraints 
will do little to improve accuracy if the volatility is real and not a figment 
of the data. 

Given the inherent difficulty in measuring income changes, especially 
for areas in which income is largely agricultural, the Panel does not 



The Panel recommends that the Census Bureau continue to make 
postcensal population estimates for all counties and for all places 
above a certain size. That certain size, the threshold, should be deter- 
mined by a systematic evaluation of estimation methods against the 
1980 census. The Census Bureau should not make postcensal popula- 
tion estimates for places with population below that threshold. 

It is the view of the Panel that the Census Bureau should continue to pro- 
duce estimates for selected subcounty areas; however, estimates should 
not continue to be made for subcounty areas that are too small for ac- 
curate estimates. Although more evaluation is needed before a precise de- 
termination can be made of how small is too small, it is clear that a 
population of 500 is top small. The average relative error of postcensal 
estimates of total population for subcounty areas with less than 500 
population was 23 percent (based on data for areas that had special cen- 
suses taken 1975). For subcounty areas with 1,000-2,499 population the 
average error in estimates of total population was 10 percent, but this 
represented an average of 111 percent in the estimate of 1970-1975 
change in population of these areas. In our view — with the data available at 
this time — a population of 5,000 or 10,000 may be a reasonable threshold, 
but a final determination should await a comparison of postcensal esti- 
mates for 1980 with 1980 census counts. In 1975, only 15 percent of the 
subcounty areas (municipalities and townships) had a population of 5,000 
or more, but these areas contained 83 percent of the total population of all 
subcounty areas. Similarly, more than one-third (36 percent) of all sub- 
county areas had less than 500 population, but these areas contained less 
than 2 percent of the total population of subcounty areas. 

RECOMMENDATION 2 

The Panel recommends that the Census Bureau not make postcensal 

estimates of per capita money income below the county level. Serious 




Overview and Recommendations 


35 


consideration should be given to discontinuing estimates for counties 
as well, but a decision on this should await comparisons of the post- 
censal estimates with the 1980 census. 

The task of estimating postcensal per capita income is even more for- 
midable than that for population. The limited evaluation data available 
indicate that the subcounty per capita income estimates are less accurate 
than the population estimates. No evaluation data were available for the 
county estimates, but the Panel is suspicious of their accuracy, especially 
for those counties for which farm income is a significant component of 
total income. Since the subcounty population and income estimates are 
used to update general revenue sharing allocations, alternative ways of up- 
dating these allocations may need to be considered; some suggestions are 
given below. 

RECOMMENDATION 3 

The Panel urges that responsibilities within the Census Bureau be 
reassigned to bring theoretical and applied statisticians more fully 
into the estimation program, especially in relation to the develop- 
ment, analysis, and review of estimation procedures. The Bureau 
should use expertise from within to pursue methodological innova- 
tions, and when this expertise is not available, the Bureau should 
draw upon appropriate talent from outside. 

The Panel believes that the postcensal estimation program has not re- 
ceived sufficient attention from theoretical and applied statisticians. The 
Bureau has successfully applied a few methodological innovations (such as 
the empirical Bayes estimation methods — see Fay and Herriot (1979)), but 
there is room for more. Examples of underutilized methods for the 
postcensal estimation program include variance components models, em- 
pirical Bayes estimation, time-series models, and use of diagnostic tech- 
niques for model fitting. More research is also needed to develop and ex- 
tend methodology for evaluation of the estimates. The capabilities for im- 
plementing these methods are not sufficiently used in the estimation pro- 
gram at present. 


RECOMMENDATION 4 

The local estimates of population and per capita income should be 
given a full statistical evaluation. This should especially include the 
following: a statement of desired statistical criteria for estimates with 



The Panel’s evaluation of the accuracy of the estimates has rested larg 
on comparisons of the estimates with the results of special censuses c 
ried out in the 1970s. For various reasons, the areas for which special c 
suses were done may not be typical of all areas. More conclusive evali 
tion of the estimates awaits comparison of the postcensal estimates w 
the results of the 1980 census. The Panel believes that the use of censu 
is the best method of evaluating postcensal estimates and primary e 
phasis should be placed on the 1980 census results as the standard agai 
which to compare postcensal estimates. In addition, the Bureau shoi 
continue to use special censuses for evaluating postcensal estimates (b( 
those conducted for other purposes and those conducted specifically 
evaluating postcensal estimates). 

A promising method of evaluation uses estimates obtained from lar 
high-quality sample-survey estimates, such as the Current Population S 
vey (cps). These sample-survey estimates need not be highly accurate 
themselves if their variances are known. The evaluation of methods woi 
be enhanced if the cps were redesigned in minor ways to make it more u 
ful for the estimation of the population and per capita income of a sam 
of counties and large cities. (Such changes are discussed in section 5,2 

The systematic evaluation of the estimation methodology that the Pa 
recommends will be an expensive undertaking, but the Panel feels tl 
sufficient resources should be allocated for this purpose. 

RECOMMENDATION 5 

The Panel recommends that the Census Bureau give serious con^ 

sideration to relaxing its uniformity criterion and, instead, strive tc 

obtain the most accurate estimates. 

The Census Bureau currently uses a uniform methodology to produce 
estimates. The same methods are used for all counties and for all lo 
areas within any one state, regardless of population size, rate of growth, 
unusual age or other composition factor. If data are available for some t 
not all jurisdictions within a state, the Census Bureau does not inc 
porate these data (except for special census data) into its own estimates i 
any of the jurisdictions. Relaxation of this uniformity criterion could 
crease the accuracy of the Bureau’s estimates. The Panel recognize 
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however, that administrative and political considerations (such as defense 
against challenges by local areas) were involved in the Bureau’s decision to 
use a uniform methodology. Therefore the Panel feels that the Census 
Bureau should be the judge of the extent to which it is feasible to relax the 
uniformity constraint. (Suggested ways of relaxing this constraint are 
presented in section 5.2b.) 

RECOMMENDATION 6 

The Census Bureau should prepare a report describing in detail and 
explaining the rationale for its methodology for postcensal popula- 
tion and per capita income estimation. 

The Census Bureau’s documentation of its methodology for population 
estimation is currently scattered and incomplete. Since the methodology is 
not described in detail elsewhere, the Panel found it necessary to compile 
such a description, which is found in Appendix A. A model for the kind of 
report we recommend is The Current Population Survey: Design and 
Methodology (Bureau of the Census, 1978b). The report should include 
detailed documentation for the methodology, rationale for the methodol- 
ogy — what is being measured, what criteria of accuracy are employed— 
and evaluations of tests of the methodology. 

RECOMMENDATION 7 

The Panel recommends that the Census Bureau’s Federal-State Co- 
operative Program for Local Population Estimates (fscp) be 
strengthened. In particular, the Census Bureau should seek authori- 
zation and funds to provide resources for state activities and for travel 
and consultation by state personnel with Census Bureau staff. 

The demographic programs in the state agencies need to be strong, fis- 
cally and administratively sound, and professionally staffed. Their 
strengthening would improve and extend the states’ capabilities to provide 
basic data series for local estimates, to evaluate the quality and com- 
pleteness of their data, and to review estimates generated by the Bureau of 
the Census. 


RECOMMENDATION 8 

The Panel recommends that the place of residence question be in- 
cluded in the 1980 irs individual income tax returns and that funds 
be provided to process the data obtained by the question. 



tion is not obtained and analyzed, tne bureau s aoiiiiy lo maimain me 
curacy of the local estimates for the 1980s will be impaired. It is clearly i 
late now to collect the needed information on the 1979 returns— with i 
added advantage that particular year would have had (explained in 1 
letter)— but it is essential that the information be collected in 1980 
1981) in order to update the procedure for the allocation of mailing i 
dresses to appropriate places of residence. 

Recommendations 1 and 2 have significant implications for gene 
revenue sharing and other programs that use the postcensal estimates 
population and income for determining the allocation of funds or otl 
resources. For grs (as now structured), if postcensal estimates of popu 
tion or income are not uniformly available for all subcounty jurisdictic 
in a state, they cannot be used at all. In addition, the reference dates 
the data for each variable must be the same for all subcounty units 
These rules imply that the distribution of subcounty proportional shai 
of county area allocations^® must either (1) remain frozen until the n( 
census or (2) be updated solely on the basis of changes of local adjust 
taxes and, to a lesser extent, intergovernmental transfers of revenue. Ev 
if the proportional shares of county area allocations to subcounty areas i 
fixed, the sizes of the subcounty allocations will change because g 


'^See Appendix K for copies of the letters. 

'^Possibly, subcounty per capita income estimates pertaining to different time periods co 
be used for local governments in different counties, provided the time references for all 
come estimates within each county were the same. Thus special census results for a s 
county area could not be used unless such results were available for all subcounty areas in 
state or county. 

Determination of county area allocations is an intermediate stage in the application of 
general revenue sharing formula. The county area allocation equals (if the effects of flo 
and ceilings constraining allocations are neglected) the total allocation to the county gove 
ment and to all eligible local jurisdictions in the county. 
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allocations are determined in a hierarchical manner; state allocations are 
determined first, then divided among county areas and the state govern- 
ment, and then the county area allocations are divided among subcounty 
areas and the county government. Thus the sizes of the subcounty alloca- 
tions will change because the county-area allocations will be updated 
(assuming that per capita income estimates for counties are continued). 

Under the first option, if a mid-decade census is carried out to provide 
accurate population and per capita income statistics for small areas, those 
results could be used to update the subcounty proportional shares of 
county allocations for grs every 5 years. If a mid-decade census is not con- 
ducted or is of insufficient scope to yield accurate small-area statistics, 
then those shares would be updated every 10 years. Currently, only one 
data element in the general revenue sharing formula is not updated be- 
tween decennial censuses; urbanized population of a state. 

It was outside the scope of the Panel’s charge to determine which option 
would result in more accurate allocations for grs. Because of interrela- 
tionships among different data elements, problems can arise if the data 
refer to different time periods. A hypothetical example can illustrate this 
point. A subcounty area’s allocation, if not constrained by floors or ceil- 
ings (see Appendix E), is equal to a fraction ot the allocation to the county 
area. The fraction is proportional to the ratio of the subcounty area’s ad- 
justed tax collections to the square of its per capita income, divided by the 
sum of these ratios for all subcounty units in the county. It is plausible 
that changes in local tax collections correspond to some extent to changes 
in per capita income. Suppose percent changes in a local area’s tax collec- 
tions correspond exactly to changes in the square of its per capita income. 
Then if perfectly accurate data were available, the area’s shares of grs 
allocations would not change {ceteris paribus) even though the per capita 
income and adjusted taxes data did change. This is clearly the same out- 
come that would result if no subcounty data were updated. On the other 
hand, updating adjusted taxes data alone would serve to increase alloca- 
tions to areas with the fastest rising per capita income and to decrease 
allocations to areas with slowly rising or even decreasing per capita in- 
comes. This is surely contrary to the intent of the law. This extreme exam- 
ple is presented not to argue for either option but to illustrate that the 
question of whether to update some of the data elements and not others is 
a subtle one and merits careful consideration. 

Allocations are targeted on problems. The grs formula was designed to 
“put the money where the needs are’’ (Joint Committee on Internal 
Revenue Taxation, 1973, p. 2). But while the statistical variables used for 
allocation form some kind of measure of the problem, they can never yield 
an exact measure. They are rather, as Bixby (1977) and the Advisory 



type of services needed, the distribution of income, or cost-of-living 
ferences, and it is only a limited measure of affluence. 

While the Panel recognizes the desirability of using current datj 
determining allocations, it does not believe that these data can or 
should reflect up-to-the-minute changes. The Panel notes that one irr 
tant allocation — apportionment of seats in the U.S. House of Represe 
fives— requires updates of the data only every 10 years. Furthermore 
ing frequent updates of the data can even work against the intent o 
legislation. As the House Select Committee on Population (U.S. Cong 
1978, p. 7) noted: 


Formulas for the distribution of Federal aid typically include a population-siz 
tor. Therefore, if an area loses some of its inhabitants, it is likely to lose funds 
it most needs Federal assistance — during the transition to a smaller tax bas 
changed needs for social services. 


The GRs legislation does not require that updated population an 
come estimates be produced; it only requires that the most r( 
estimates that the Census Bureau does produce be used for calculi 
allocations. One way to achieve more extensive updating of the alloca 
than that recommended by the Panel would be to change the legisl; 
specifying the allocation formulas to take greater advantage of dat 
large subcounty areas, for which current and relatively accurate up( 
are available. 

For example, in calculating grs allocations for periods for v 
postcensal population estimates are desired, all subcounty jurisdic 
with populations below a threshold number (e.g., 500, 5,000, or 10 
would initially be treated as an aggregate. Postcensal population estir 
would be prepared by some procedure for each whole aggregate, 
allocation to each aggregate would be determined by formula, possil 
the same manner that allocations would be determined for jurisdic 
with populations exceeding the threshold. Allocations within eacl 
gregate would be apportioned on the basis of the last decennial c( 
figures for population and income (and possibly other data). This 
cedure would allow current updates to be used for the larger subcc 
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areas but not for smaller areas. The example is illustrative and would need 
further refinement to become operational. 

The Panel suggests that, in light of its recommendations that per capita 
income no longer be updated below the county level and that serious con- 
sideration be given to not updating per capita income at the county level, 
the Bureau of Economic Analysis’s county personal income statistics (on a 
place of residence basis) be considered for general revenue sharing pur- 
poses as a possible substitute for county per capita money income. State 
personal income currently enters into the determination of state alloca- 
tions, but personal income does not now enter into the substate formula. 
Since personal income as measured by the Bureau of Economic Analysis 
has a conceptual basis consistent with the national income and product 
accounts, it may be a more appropriate proxy than money income in the 
GRS substate allocation formula. The Panel does note, however, that the 
Bureau of Economic Analysis’s county personal income estimates are 
untested and may be no more accurate than the Census Bureau’s county 
money income estimates. 

Possible reduction of the cost of producing estimates was not the 
motivation for the Panel’s recommendations 1 and 2. Should a mid- 
decade census provide accurate local area statistics, however, benefit-cost 
considerations might indicate a further reduction in the scope of the 
estimation program. In particular, if population estimates are made only 
for counties and large subcounty areas, it may be possible to reduce the 
cost of the administrative records method by reducing the extent of the 
geocoding operation. (Only 9 percent of the subcounty areas had 10,000 
or more population in 1975, although those areas contained 74 percent of 
the total population.) 

Sensitivity analyses should be used to explore the effects of alternative 
ways of producing estimates on the accuracy and timeliness of the 
estimates and the effects of those qualities in turn on uses of the estimates. 
Benefit-cost analyses should be done to compare the costs of alternative 
techniques or conventions with the benefit of their effects on the estimates’ 
uses. Explicit benefit-cost analysis poses difficult problems — such as 
specification of how much it is worth spending on data to reduce errors in 
allocation by given amounts — but even if the problems cannot be com- 


These refinements concern (but are not limited to) how to prepare separate estimates for 
portions of jurisdictions that straddle two or more counties, how to identify and treat areas 
that may grow to exceed the threshold during the period of estimation, and how to handle 
townships and municipalities separately (only 1,823 of the 16,822 townships had a popula- 
tion of 5,000 or more in 1975; see Bureau of the Census, 1978a, Table E). 



curate subcounty estimates can also be obtained from a sample ce 
but careful consideration should be given to sampling design. Differ 
sampling rates should be seriously considered, with relatively low rat 
the largest areas and high rates for very small areas. Below some thre 
it may be more cost effective to obtain population and income data 
complete census rather than by a sample. 

The Panel is concerned about the growing amount of legislatior 
authorizes the distribution of funds or other resources on the ba 
postcensal estimates for small areas. Under the present estimation s^ 
the errors in those estimates are likely to be large. Alternative appro 
that would yield more accurate estimates are either enormously expe 
(e.g., annual censuses) or socially repugnant (e.g., a population regi 
The Panel believes that the Census Bureau should not allow itself to t 
in the position of having to defend estimates that are unavoidably si 
to large amounts of statistical error. The pressures on the Census B 
from complaints, challenges, and likely adjudication are detrimental 
efficient operation. 

One particularly troublesome use of statistics for allocation purpc 
the determination of whether the population of a given area exce 
threshold number (usually 50,000 or 100,000), in which case the are 
become eligible for funds. For cities with populations near 
thresholds it is not possible to say with certainty whether the populal 
above (or below) the threshold. Cities for which the Bureau pro 
estimates slightly lower than the threshold number are understan 
eager to challenge these figures. The Panel notes and endorses recorr 
dation 9 of the Subcommittee on Statistics for the Allocation of 1 
(Office of Federal Statistical Policy and Standards, 1978): 

That, to minimize the effects of data errors, eligibility cutoffs be such that t 
a gradual transition from receiving no allocation to receiving the full fc 
amount. 
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Evaluations of 
Estimates 


2.1 METHODS OF EVALUATION USED BY THE 
CENSUS BUREAU 

Evaluations of estimates inform both the producers and the users of the 
estimates about the strengths and limitations of the estimators (i.e., esti- 
mation procedures). Evaluations help users determine how well the esti- 
mates meet their needs. If the estimates are highly accurate (or inac- 
curate), users may use them (or decline to use them) in making important 
decisions. If the users are paying for the estimates, they may find less 
costly but less accurate estimates to be acceptable, or if evaluations show 
the estimates to be inaccurate, users may decide to seek better estimates 
even though the expense may be greater. Evaluations aid producers in 
their efforts to correct weaknesses in existing estimators, design new esti- 
mators, average or otherwise combine different estimators, and accept or 
reject estimators. 

The Census Bureau relies primarily on four methods to conduct its 
evaluations of population estimates: comparison of estimates with decen- 
nial census results, comparison of estimates with special census results, 
comparison among alternative estimates, and use of demographic and 
statistical logic (Bureau of the Census, 1974, p. 15). ' We note that decen- 
nial censuses are too infrequent for evaluation purposes and that special 
censuses may give a distorted overall picture because the areas receiving 

' Demographic and statistical logic focuses on whether the assumptions underlying a pro- 
cedure conform to a logical model of how demographic changes occur. 



2.1a COMPARISON WITH DECENNIAL CENSUSES 


To use decennial census results to evaluate postcensal estimates, the 
Bureau prepares postcensal estimates for a date for which decennial cen- 
sus results are or will be available. The postcensal estimates and census 
results are then compared; discrepancies are attributed to errors in the 
postcensal estimates, although some of the discrepancies — or lack there- 
of — may arise from errors in the census figures (see Appendix I for further 
discussion). A drawback with using decennial census figures is their infre- 
quent appearance. The performance of estimators can only be evaluated 
and compared for 10-year intervals. Little is known about the behavior of 
estimators as the time interval for which change is being estimated in- 
creases. Even when the average error is known for estimates of lO-year 
change, there is still uncertainty about the average error when the time in- 
terval is only a few years. Is the variance of a 5-year estimate one-half that 
of a 10-year estimate? One-quarter? Seven-tenths? When mid-decade 
census results become available, this problem will ease somewhat. 

Two decennial censuses may be used as benchmarks for evaluation pur- 
poses, one at the beginning and one at the end of a 10-year period. The 
former is used prospectively, for making inferences about the accuracy of 
estimates in the current postcensal period. For example, to test the ratio- 
correlation method and component method II for use in the 1970s, the 
Bureau calculated 1970 postcensal estimates based on the census and 
symptomatic data for 1960 to 1970 and then compared those estimates 
with the results of the 1970 census. 

There are two problems in using such comparisons for making in- 
ferences about the actual accuracy of the methods for estimating popula- 
tion after 1970. First, demographic processes continually evolve, so that 
methods that performed well in the 1960s may perform poorly in the 
1970s. In particular, assumptions valid in one decade may not be valid in 
the next, relationships among variables may shift over time, and the qual- 
ity of the available data may also change. 

Second, the 1970 census data that are used for gauging the accuracy of 
the methods are also used for forming and selecting modifications to the 
methods for use in the 1970s. (A good example of this latter use is the 



moditication oi tne ratio-correiation metnod to allow tor trends in 
coverage ratios; see Bureau of the Census (1974) and section 2.5 of Appen- 
dix A for details.) Using the same data to evaluate and modify a method 
and then to evaluate the modification can lead to overestimation of the 
modified method’s accuracy. Methods of cross-validation (Mosteller and 
Tukey, 1977) can avoid this problem and should be explored. For exam- 
ple, decennial census data for half of the areas could be used to modify the 
methods, and the data for the other half could be used to assess the ac- 
curacy of the modified methods. 

2.1b COMPARISON WITH SPECIAL CENSUSES 

Use of special censuses avoids some of the problems involved with using 
decennial censuses, and a more continuous monitoring is possible. Special 
censuses are censuses conducted at the request and expense of municipali- 
ties or counties within a state; they are not part of a national effort. 
Special censuses can occur throughout the decade, and so the perfor- 
mance of the estimators can be observed continuously over time. 

A problem with drawing inferences about overall error rates from com- 
parisons with special censuses arises from selection biases. For example, 
the places receiving special censuses tend to have higher-than-average 
growth rates, and it is known that, other things being equal, postcensal 
population estimation methods are less accurate for fast-growing places 
than for moderately growing places. ^ Also, special censuses do not 
generally yield estimates of per capita income. Exceptions to selection bias 
and absence of questions on income occurred in the sample of 86 special 
censuses conducted in 1973 by the Census Bureau to permit evaluation of 
the estimation methodologies for population and per capita income. The 
Bureau paid for these censuses and selected the areas so as to constitute a 
probability sample of local areas with less than 20,000 population. Special 
censuses conducted specifically for evaluation purposes provide the 
strongest evaluation and should be adopted when resources permit. 
However, it is probably prohibitively expensive to take a sufficiently large 
number of special censuses to yield conclusive results. 

It must be recognized that decennial or special censuses are themselves 
subject to error, a fact that should be taken into consideration in evalua- 
tions. Thus the difference between a census enumeration and a postcensal 
estimate is not in general just the error in the postcensal estimate. (Appen- 

^ Regression analysis or other techniques of data analysis should be useful for disentangling 
the effects of these biases, but modern techniques of data analysis have so far seen little ap- 
plication to the special census comparisons. 



series. For instance, several estimates (obtained from the ar, rc, cm n, 
and possibly other methods) of postcensal county populations can be com- 
pared, and the dispersion of the estimates studied. Unfortunately, this in- 
formation is of limited utility because the true population size is unknown: 
the alternative estimates may all lie above the true value, they may all lie 
below it, or they may straddle it. An increase in the extent of dispersion 
over time is often taken as being indicative of the deterioration of the ac- 
curacy of one or more of the estimators over time, but the indication is 
only a weak one. 

2. Id DEMOGRAPHIC AND STATISTICAL LOGIC 

A fourth method of evaluation used by the Census Bureau is consideration 
of the demographic and statistical logic of the assumptions underlying the 
estimation methods, along with judgment. For example, some have 
argued that the ar method systematically underestimates the populations 
of large central cities (Mann, 1978). This argument is based on logic 
rather than on statistical evaluation of the estimates produced by the 
method (see section 5.1b(l) for further discussion). Judgment focuses on 
the plausibility of the output of the methods. For example. Census Bureau 
staff use judgment to decide when to stop incorporating information from 
past special censuses into current population estimates provided by the cm 
II or rc method (see Appendix A, section 3.11). Appendix C further 
analyzes the role of judgment in making postcensal estimates. 


2.2 PANEL EVALUATION OF POPULATION 
ESTIMATES 

This section analyzes empirical evidence about the accuracy of postcensal 
population estimates. Our discussion focuses largely on comparisons of 


these comparisons because the areas receiving special censuses are usually 
self-selected; they are not a random sample of all areas for which 
estimates are made. 

In comparing estimates with the special census counts, we focus on 
three of the criteria of accuracy discussed in section l.ld: 

Criterion 2 — Low average relative error. Relative error is measured by 
the difference between the population estimate and the special census 
count, expressed as a percentage of the count (hereafter referred to simply 
as “percent difference”). Average relative error refers to the arithmetic 
mean of percent differences disregarding sign. 

Criterion 3 — Few extreme relative errors. Extreme error is measured by 
the proportion of percent differences exceeding a specified value, 
disregarding sign, often 10 or 15 percent. 

Criterion 4 — Bias. Bias is measured in terms of the numbers of areas 
whose estimates exceed the special census counts (positive differences) 
and fall below the special census counts (negative differences). 

These are the same criteria used by the Census Bureau in its evaluation 
of state and county population estimates against the 1970 census results 
(Bureau of the Census, 1973b). 

Criterion 4 provides information about bias in the population estimates 
for a group of areas — an excessive number of positive differences suggests 
an upward bias in the estimation methodology for the group of areas. It is 
important to remember, however, that the subcounty estimates are con- 
trolled to those for larger geographic areas, so if the estimate for a county 
is too high, the subcounty estimates for that county may appear to be 
biased upward even if they are unbiased estimates of the proportions of 
county population living in the subcounty areas. 


2.2a STATE POPULATION ESTIMATES 

Tests reported by the Bureau of the Census (1974) show that 1970 esti- 
mates for states (derived from the 1960 census and symptomatic data for 

^The special censuses that were used include those conducted by the Census Bureau (sum- 
marized in Current Population Reports, Series P-28) and also those conducted by state or 
local agencies and accepted by the Census Bureau. 
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Census, 1980). 


2.2b COUNTY POPULATION ESTIMATES 

Estimates of county population, in terms of average error, are qui 
curate. The average percent difference between postcensal estimate 
special census counts was 3.9 percent for 133 counties in which s 
censuses were taken during the 3-year period of 1974 through 1976 ( 
2.1). The accuracy of the estimates varied with the 1970 population : 
the county and with the percent change in population size betweer 
and the special census date; large counties were estimated more accu 
than small counties; slowly growing counties were estimated mo 
curately than declining or rapidly growing counties. For exampl 
average percent difference between postcensal estimates and specie 
sus counts decreased from 7.1 for counties with 1970 population < 
than 1,000 to 1.4 for counties with 1970 population of 100,000 or 
The average percent differences for large counties (25,000 or 
population) were small, 3 percent or less, regardless of rate of gro) 
decline. 

When counties were classified by rate of change in population, thi 
accurate estimates were for those counties with moderate growt 
average percent difference was 2.0 for counties that grew by 5 to 1 
cent and 2.7 for those that grew by 10 to 15 percent. Counties th; 
grown by 25 percent or more had substantially larger errors (an a 
percent difference of 6.8), as did counties that had declined in popi 
by 5 percent or more (an average percent difference of 5.1). The larg 
rors occurred among counties that were both small and experiencing 
growth (Table 2.2). The average percent difference reached a high ( 
percent among counties that had less than 5,000 population in 19' 
had grown by 15 percent or more since 1970. 

Because the county estimates are, on the whole, quite accurate, r 



TABLE 2.1 Percent Difference Between Postcensal Estimates of Popu- 
lation and Special Census Counts, by 1970 Population and Percent 
Change in Population Since 1970: 133 Counties With Special Censuses 
Taken Between January 1, 1974, and December 31, 1976 


Percent of Counties 
With Difference of 


1970 Population and 

Percent Change in 

Population Since 1970 

Number 

of 

Counties 

Average 

Percent 

Difference' 

5.0 Percent 
’ or More" 

10.0 Percent 
or More" 

All counties 

133 

3.9 

23 

8 


By 1970 Population 



Less than 1,000 

24 

7.1 

50 

25 

1,000 to 4,999 

23 

5.2 

30 

17 

5,000 to 9,999 

12 

3.6 

33 

0 

10,000 to 24,999 

20 

3.6 

25 

5 

25,000 to 49,999 

8 

3.5 

25 

0 

50,000 to 99,999 

14 

2.5 

7 

0 

100,000 or more 

32 

1.4 

0 

0 


By Percent Change Since 1970 


—5.0 percent or more 

11 

5.1 

27 

18 

—0.0 to —4.9 percent 

16 

4.0 

31 

6 

-hO.O to 4-4.9 percent 

25 

3.3 

20 

8 

4-5.0 to 4-9.9 percent 

19 

2.0 

5 

0 

-flO.O to -t-14.9 percent 

23 

2.7 

13 

0 

4-15.0 to 4-24.9 percent 

17 

3.6 

24 

6 

4-25.0 percent or more 

22 

6.8 

45 

23 

"Percent difference for each county equals postcensal estimate (as of July 1) minus adjusted 


special census count (interpolated or extrapolated to July 1 of year special census was taken), 
expressed as percent of adjusted census count. Average percent difference calculated as 
arithmetic mean of percent differences disregarding sign. Counties with differences of 5.0 (or 
10.0) percent or more were tallied disregarding sign of differences. 

source: Unpublished data from the Bureau of the Census provided by Frederick 
Cavanaugh. 


surprising that extreme errors are relatively uncommon except among 
small counties or counties undergoing rapid growth or decline. Of all 
counties with less than 1,000 population, 25 percent had errors of 10 per- 
cent or more, as did 17 percent of the counties with 1,000 or 4,999 popula- 
tion, 23 percent of the counties that had grown by 25 percent or more 
since 1970, and 18 percent of the counties that had declined in population 
by 5 percent or more (Table 2.1). Extreme errors seldom occurred among 
counties with larger populations or slower rates of change. 





-5.0 


0.0 


1970 Population 

Total 

-5.0 -0.0 

or to 

More —4.9 

-1-0.0 

to 

-1-4.9 

-1-5.0 

to 

+ 14.9 

+ 15,1 

or 

More 



All Counties 




Average percent dif- 

erence" 

3.9 

5.1 4.0 

3.3 

2.4 

5.4 

Number of counties 

133 

11 16 

25 

42 

39 

Total with positive 

differences 

65 

8 10 

Less Than 5.000 

17 

23 

7 

Average percent dif- 

erence" 

6.2 

5.1 4.3 

5.5 

3.1 

11.2 

Number of counties 

47 

11 9 

9 

7 

11 

Total with positive 

differences 

25 

8 5 

5.000 to 24,999 

6 

4 

2 

Average percent dif- 

erence" 

3.6 

— 4.6 

3.4 

2.9 

4.4 

Number of counties 

32 

0 4 

4 

14 

10 

Total with positive 

differences 

16 

— 3 

25.000 to 99.999 

3 

7 

3 

Average percent dif- 

erence" 

2.9 

— _ 

2.5 

3.2 

3.1 

Number of counties 

22 

0 0 

5 

9 

8 

Total with positive 

differences 

9 

100,000 or More 

3 

4 

2 

Average percent dif- 

ference" 

1.4 

— 2.5 

1.4 

0.8 

1.8 

Number of counties 

32 

0 3 

7 

12 

10 

Total with positive 

differences 

15 

— 2 

5 

8 

0 


"Percent difference for each county equals postcensal estimate (as of July 1) minus adj 
special census count (interpolated or extrapolated to July 1 of year special census was ta 
expressed as percent of adjusted census count. Average percent difference calculat 
arithmetic mean of percent differences disregarding sign. 

source: Unpublished data from the Bureau of the Census provided by Free 
Cavanaugh. 


Evaluations of Estimates 55 

There also is evidence of bias in the estimation methods: they tend to 
overestimate the population of declining counties and to underestimate 
the population of rapidly growing counties. For example, the estimates 
were too high for 8 of 1 1 counties that had declined in population by 5 per- 
cent or more since 1970, while estimates were too low for 32 of 39 counties 
that had grown by 15 percent or more (Table 2.2). That is, the estimation 
methods tend to underestimate the change in population, for both declin- 
ing and increasing populations. 

The patterns noted above also hold, in general, for each of the three in- 
dividual methods that are averaged to obtain the postcensal estimates 
(Table 2.3). The error is, on the average, lower for the postcensal estimate 
than for the individual methods. It should be noted, however, that the dif- 
ference between the average error of the postcensal estimates and the ar 
estimates is very small and that the average error for the postcensal 
estimates is not consistently lower in all the population-size and rate-of- 
growth subgroups. This result can occur because the different methods 
that are averaged are not equally accurate. Of the three individual 
methods the average percent difference is lowest for the ar method (4.0), 
next lowest for the rc method (4.9), and highest for cm ii (6.4). The pro- 
portion of percent differences (disregarding sign) that are 10 percent or 
more is smallest (8 percent) for the ar method and the postcensal 
estimate, next smallest (13 percent) for the rc method, and largest (21 
percent) for cm ii (calculated from Tables 2.4 and 2.5). 

The accuracy of the county estimates is also affected by the age struc- 
ture of the population. Table 2.6 shows that the estimates are less ac- 
curate for counties whose populations had age distributions dissimilar to 
the age distribution for the nation as a whole. The index of dissimilarity 
(A), the measure of dissimilar age structure used in Table 2.6, is defined 
as 


where Pjj is the percentage of county,/ population in age category i and p,- is 
the percentage of the total national population in age category i. (The age 
categories used are 0-17 years, 18-64 years, and 65 and over). With one 
exception the average percent differences are larger for counties with 
dissimilar age structure, that is, with A of 5 percent or more. The excep- 
tion is the RC estimates for small counties (less than 5,000 population), 
where the average percent difference between the estimates and special 
census counts was lower for counties with dissimilar age structure. But 
even for this case the proportion of differences (disregarding sign) that ex- 
ceeded 5 percent was higher for the counties with dissimilar age structure. 



Average Percent Difference' 


1970 Population and 

Percent Change in 
Population Since 1970 

Number 

of 

Counties 

Post- 

censal 

Esti- 

mate* 

Compo- 

nent 

Method 

II 

Ratio- 

Corre- 

lation 

Adminis- 

trative 

Records 

Method 

All Counties 

133 

3.9 

6.4 

4.9 

4.0 


By 1970 Population 



Less than 1 ,000 

24 

7.1 

13.6 

8.8 

6.8 

1,000 to 4,999 

23 

5.2 

8.8 

7.2 

5.3 

5,(500 to 9,999 

12 

3-6 

5.4 

5.0 

5.3 

10,000 to 24,999 

20 

3.6 

5.6 

3.7 

3.3 

25,000 to 49,999 

8 

3.5 

4.1 

5.5 

2.8 

50,000 to 99,999 

14 

2.5 

3.6 

3.4 

3.2 

100,000 or more 

32 

1.4 

2.0 

1.7 

1.7 


By Percent Change 

Since 1970 



— 5.0 percent or more 

11 

5.1 

11.8 

5.7 

7.5 

—0.0 to —4.9 percent 

16 

4.0 

6.2 

6.0 

4.0 

-f O.O to 4-4.9 percent 

25 

3.3 

7.0 

4.3 

2.9 

4-5.0 to 4-9.9 percent 

19 

2.0 

3.8 

3.2 

2.1 

4-10.0 to 4-14.9 percent 

23 

2.7 

4.1 

3.0 

3.4 

4-15.0 to 4-24.9 percent 

17 

3.6 

6.2 

3.8 

2.6 

-f 25.0 percent or more 

22 

6.8 

8.0 

8.8 

7.1 


"Percent difference for each county equals postcensal estimate (as of July 1) mitius adjusted 
special census count (interpolated or extrapolated to July 1 of year special census was taken), 
expressed as percent of adjusted census count. Average percent difference calculated as 
arithmetic mean of percent differences disregarding sign. 

^Calculated as average of estimates obtained by the three methods (in some states, also in- 
cludes a fourth estimate prepared by the states). 

source: Unpublished data from the Bureau of the Census provided by Frederick 
Cavanaugh. 


Simple stochastic models for error in the estimates lead one to believe 
that the error in the estimates will increase as the length of time since the 
last decennial census increases. However, the hypothesis of increasing er- 
ror over time is difficult to test with the available data because the areas 
receiving special censuses are self-selected, so that differences in esti- 
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2.2c SUBCOUNTY ESTIMATES 

Estimates of the population of subcounty areas in 1975 were quite ac- 
curate for areas with large populations but were increasingly inaccurate as 
population size decreased. For example, the average percent difference 
between 1975 population estimates and comparable 1975 special census 
counts was only 2.6 to 2.7 percent for areas with 25,000 or more popula- 
tion in 1970 but increased to more than 25 percent for areas that had less 
than 250 population in 1970 (Table 2.7). 

The accuracy of the estimates also varied greatly by the rate at which 
the population was changing from 1970 to 1975. Areas with relatively 
stable populations — less than 5 percent growth or decline — had an 
average percent difference of 6 percent, as compared with areas that grew 
by at least 50 percent or that declined by at least 10 percent, which had an 
average percent difference of more than 20 percent. 

The strong patterns exhibited in Table 2.7 — increasing error with 
decreasing size of population and increasing error with increasing rate of 
change in population size — persist when measures of accuracy are cross- 
classified by both variables simultaneously, as shown in Table 2.8. 
Estimates for areas that were both small and subject to rapid growth or 
decline were most inaccurate. For example, the average error was 43 per- 
cent for areas that had less than 500 population in 1970 and whose popu- 
lation had declined by 10 percent or more between 1970 and 1975. The 
average error for areas that grew by 50 percent or more from 1970 to 1975 
was high in all population-size groups except the largest: the average per- 
cent difference decreased from a high of 27 percent for areas with less 
than 500 population to 19 percent for areas with 10,000 to 24,999 popula- 
tion and declined sharply to 7 percent for areas with 25,000 or more 
population. Similarly, very small areas (those with less than 500 popula- 
tion) had large errors regardless of the rate of change in population size: 
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Percent Differences (Positive 
or Negative) of 


1970 Population and 
Percent Change in 
Population, 1970-1975 

Average 

Percent 

Differ- 

ence“ 

Number 

of 

Areas 

Positive 

Differ- 

ences* 

10 Per- 
cent or 

More 

15 Per- 
cent or 

More 

25 Per- 
cent or 

More 

All Subcounty Areas 

11.7 

799 

44 

34 

24 

12 


By 1970 Population 




Less than 50 

26.0 

33 





50 to 249 

27.1 

123 

M6 

66 

52 

30 

250 to 499 

13.5 

67 

J 




500 to 999 

9.8 

77 

43 

32 

17 

6 

1,000 to 2,499 

10.3 

118 

36 

33 

21 

8 

2,500 to 4,999 

7.2 

111 

39 

21 

15 

5 

5,000 to 9,999 

8.2 

88 

45 

26 

17 

5 

10,000 to 24,999 

5.2 

94 

45 

10 

7 

3 

25,000 to 49,999 

2.6 

50 

54 

2 

2 

0 

50,000 to 99,999 

2.6 

27 

1 50 

5 

0 

0 

100,000 or more 

2.7 

11 






By Percent Change, 1970-1975 



—25.0 percent or more 

83.8 

15 

1 84 

77 

65 

47 

— 10.0 to —24.9 percent 

22.7 

42 

S 




—5.0 to —9.9 percent 

9.5 

38 

I ^2 

23 

15 

6 

—0.0 to —4.9 percent 

6.2 

77 





+0.0 to +4.9 percent 

6.4 

114 

60 

18 

13 

5 

+5.0 to +9.9 percent 

6.9 

104 

41 

23 

12 

3 

+ 10.0 to +24.9 percent 

7.5 

228 

34 

25 

14 

5 

+ 25.0 to +49.9 percent 

12.0 

105 

22 

44 

30 

7 

+50.0 percent or more 

24.1 

76 

9 

68 

66 

43 


“Percent difference for each area equals postcensal estimate as of July 1, 1975, minus a 
justed special census count (interpolated or extrapolated to July 1, 1975), expressed as pt 
cent of adjusted census count. Average percent difference calculated as arithmetic mean 
percent differences disregarding sign. 

* Percent based on total number of areas with positive or negative difference (that is, total e 
eluding areas for which the estimate was exactly equal to the adjusted census count); 11 
the 799 postcensal estimates were exactly equal to the adjusted census counts. 

source: Unpublished data from the Bureau of the Census provided by Frederi 
Cavanaugh. 



the average percent dinerence was 13 to 15 percent for areas with 
moderate growth or decline and 27 and 43 percent for areas of fast growth 
or decline. Only among areas with 25,000 or more population in 1970 were 
the estimates relatively accurate regardless of rate of change in popula- 
tion; the average percent difference for these areas was 2.4 percent among 
those that changed (growth or decline) by less than 10 percent and 6.6 per- 
cent for areas that grew by 50 percent or more. 

For all 799 subcounty areas (municipalities and townships) in which 
special censuses were taken during 1975 and compared with 1975 popula- 
tion estimates, the overall average difference was 11.7 percent (Table 2.9). 
This overall average, however, reflects the composition of the largely self- 
selected group of subcounty areas in which special censuses were taken 
and may be different from the average for the more than 35,000 munici- 
palities and townships eligible for general revenue sharing (grs).'^ For ex- 
ample, only 38 percent of the 799 special census areas had less than 1,000 
population in 1970 as compared with 54 percent of the full set of sub- 

‘*The 799 subcounty areas for which data are reported in Tables 2. 7-2.9 include 426 in which 
special censuses were taken by the Census Bureau in 1975 and 373 in which special censuses 
were taken by state or local agencies and accepted by the Bureau. The computer printout list 
from which the tables were compiled was provided by the Census Bureau, but we did con- 
siderable editing prior to our tabulations. 

The computer printout included all special censuses that were adjusted to July 1 , 1975 (by 
interpolation or extrapolation) and compared with 1975 population estimates; some of these 
censuses were taken in years other than 1975. The printout also had separate listings for 
“balances” of townships that included a municipality and for separate pieces of 
municipalities that straddled township or county boundaries. In all, there were 1,544 com- 
parisons with 1975 estimates on the printout, but 345 of them were based on a single special 
census of the entire state of Massachusetts, which was taken by the state government on 
March 1, 1975. Rather than have our comparisons dominated by one special census (of 
unknown quality) covering every subcounty area in one state, we decided to exclude the com- 
parisons for Massachusetts. In our editing we also dropped 272 comparisons that were based 
on extrapolated counts of special censuses taken in 1974, 56 comparisons for “County 
Balances,” 30 comparisons for areas in which special censuses were taken in another year or 
in both 1974 and another year, and one comparison for which we could not identify the place 
code. We also combined separate pieces of municipalities that straddled township bound- 
aries (47 pieces were combined into 18 municipalities) or county boundaries (23 pieces com- 
bined into 11 municipalities), and substituted 14 township totals for 14 “Township 
Balances” that were listed separately on the computer printout. 

The objectives of the editing process were to obtain a set of comparisons of 1975 popula- 
tion estimates with adjusted special census counts for a set of subcounty areas defined on an 
equivalent basis to general revenue sharing governmental jurisdictions (whole jurisdictions) 
and to limit the adjustment period (for interpolation or extrapolation) to less than 6 months. 
The second objective led us to exclude from our tables comparisons of 1975 estimate with ad- 
justed counts (as of July 1, 1975) interpolated or extrapolated from special censuses taken in 
any year other than 1975. 
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:: Unpublished data from the Bureau of the Census provided by Frederick Cavanaugh. 


Population, 1970-1975 Census Areas ized" ized^” 


All Subcounty Areas 

100.0 

100.0 

11.7 

12.3 


By 1970 Population 



Less than 1,000 

37.5 

55.8 

19.5 

17.0 

1,000 to 4,999 

28.7 

29.0 

8.8 

7.2 

5,000 to 9,999 

11.0 

6.4 

8.2 

7.0 

10,000 to 49,999 

18.0 

7.3 

4.3 

3.8 

50,000 or more 

4.8 

1.5 

2.6 

2.6 


By Percent Change 

, 1970-1975 



— 10.0 percent or more 

7.1 

9.1 

38.8 

38.1 

—0.0 to —9.9 percent 

14.4 

24.4 

7.3 

9.3 

4-0.0 to -1-4.9 percent 

14.3 

19.4 

6.4 

7.9 

-1-5.0 to -1-9.9 percent 

13.0 

15.3 

6.9 

9.0 

4-10.0 to 4-49.9 percent 

41.7 

30.0 

8.9 

10.8 

4-50.0 percent or more 

9.5 

1.7 

24.1 

24.4 


“Average percent differences for 799 subcounty areas in which special censuses were t 
1975, calculated as in Tables 2.7 and 2.8. 

^Average percent differences calculated by reweighting the averages using the “size a 
cent change in size" composition of all subcounty areas for which population estimj 
made by the Census Bureau. Thus the average percent difference for “all subcounty ai 
reweighted by the cross-classified “population size by percent change in population 
position of all subcounty areas for which estimates were made. Similarly, the average ; 
difference for each “population size” group is reweighted by the “percent change in | 
tion" composition of all subcounty areas in that size group. And the average perc( 
ference for each “percent change” group is reweighted by the population size compos 
all subcounty areas in that “percent change” group. 

source: Unpublished data from the Bureau of the Census provided by Meyer Zit 
Frederick Cavanaugh. 




ainerence lor suocouniy areas wiin less man i,uuu population was ly.o as 
compared with a reweighted average percent difference of 17.0. This dif- 
ference arises because the areas in which special censuses were taken con- 
tain a larger proportion of fast-growing areas than all subcounty areas for 
which population estimates are made, and fast-growing areas are subject 
to larger error than other areas. In general, the reweighting has little im- 
pact on the overall average error and on the pattern of error by population 
size and by rate of change in population. 

As in the case of the county estimates, there is strong evidence of bias in 
the subcounty estimates. The estimation method consistently tends to 
underestimate the population of growing areas and to overestimate the 
population of declining areas. This can be seen in the third column of 
Table 2.7, which reports the proportion of differences between estimates 
and special census counts that were positive (i.e., overestimates). For ex- 
ample, 84 percent of the estimates for areas that declined in population by 
10 percent or more between 1970 and 1975 were overestimates, as were 72 
percent of the estimates for areas that declined by less than 10 percent. 
Similarly, 91 percent of the estimates for areas that had grown by 50 per- 
cent or more were underestimates, as were 78 percent of the estimates for 
places that had grown by 25 to 49 percent. 

The low levels of accuracy of the estimates for small areas, and for areas 
undergoing rapid growth or decline, are evident in the measures of ex- 
treme error in the last three columns of Table 2.7. Among areas with less 
than 500 population, two-thirds (66 percent) had differences between 
population estimates and census counts of at least 10 percent, more than 
one-half (52 percent) had differences exceeding 15 percent, and almost 

'‘’The Census Bureau provided the Panel with a cross-tabulation of grs areas by size of 
population (1970) and percent change in population (1970-1975). Standardization for 1970 
population size alone increased the average difference from 1 1 .7 to 14.3 percent; standardi- 
zation for 1970-1975 change in population alone decreased the average difference to 10.7 
percent. 


areas that experienced rapid population growth or that de 
population between 1970 and 1975. Over three-fourths of the a 
declined by 10 percent or more had errors of at least 10 per( 
almost one-half had errors of at least 25 percent. Similarly, of 
that grew by 50 percent or more, two-thirds had errors of at lea; 
cent, and 43 percent had errors of 25 percent or more. Among a 
grew by 25 to 49 percent, 30 percent had errors of 15 percent 
(The detailed distributions by size and direction of percent 
reported in Tables 2.10 and 2.11 for subcounty areas classified b; 
tion in 1970 and by percent change in populations 1970-1975.) 

It should be noted that estimates for counties are considerably 
curate than estimates for subcounty areas of the same size an 
change in population. For example, counties with 1,000 to 4,99' 
tion had an average percent difference of only 5.2 as compared v 
ference of 8.8 percent for subcounty areas of the same size (see 1 
and 2.9). 

Thus far our evaluation of the accuracy of the population e 
methods has been based on percent differences between the est 
total population and special census counts. The Census Bure 
evaluations of their estimates have been based on similar measi 
for example. Bureau of the Census (1973b, 1980)). Two consi 
suggest, however, that the estimation methods should also be ev£ 
terms of the accuracy with which they measure change in popula 
the last decennial census. First, the methods are designed to 
change in population since the last census: estimates of total p 
are produced by adding the estimated change in populatic 
previous census counts. Second, the usefulness of the estimates a 
for the purpose of allocating general revenue sharing funds 
regular censuses depends on the accuracy of the estimated cl 
population. If the estimated change in population for a si 
number of areas is in the wrong direction, or if the average eri 
estimated change is excessively large, it may be preferable to usi 
census counts for allocation purposes. 

Therefore it is worth noting that percent differences between > 
change and enumerated change in population would be much la 
the percent differences between total population estimates and 
ated census counts that are summarized in Tables 2.7-2.11. ^ 
the pattern of differences would be substantially altered, since 


average percent ditterences based on total population (from Table 2.8) to 
average percent differences based on change in population. The following 
table gives the average percent differences between postcensal estimates 
and special census figures for 1975: 


Percent Change in Population, 1970-1975 


1970 Population 

-10.0 

or 

More 

-0.0 

to 

-9.9 

-HO.O 

to 

+9.9 

+ 10.0 
to 

+24.9 

+25.0 

to 

+49.9 

+50.0 

or 

More 

Less than 500 

based on change 

243 

260 

317 

91 

73 

64 

(based on total) 

(42.9) 

(13.7) 

(15.1) 

(13.5) 

(20.0) 

(27.4) 

500 to 2,499 

based on change 

113 

148 

118 

54 

42 

51 

(based on total) 

(19.9) 

(7.8) 

(5.6) 

(8.1) 

(11.5) 

(21.9) 

2,500 to 9,999 

based on change 

77 

127 

111 

34 

41 

62 

(based on total) 

(13.6) 

(6.7) 

(5.3) 

(5.1) 

(11.3) 

(26.4) 

10,000 to 24,999 

based on change 

31 

82 

67 

33 

22 

44 

(based on total) 

(5.5) 

(4.3) 

(3.2) 

(4.9) 

(6.0) 

(18.8) 

25,000 or more 

based on change 

17 

46 

50 

17 

12 

15 

(based on total) 

(3.0) 

(2.4) 

(2.4) 

(2.5) 

(3.4) 

(6.6) 


'^Algebraically, if C = 1975 special census count (adjusted), E = 1975 population esti- 
mate, P = 1970 population, then !(£■ — C)/C\ 100 equals the percent difference between 
estimate of total population and total census count (disregarding sign) and | [(£■ — P) — 
(C — P)]/(C — P) 1 100 =!(£'— C)/(C — P) 1 100, which equals the percent difference be- 
tween estimated and enumerated change in population (disregarding sign). Note also that 




100 


percent difference for total estimate 


percent change in 
population, 1970-1975 


1970 population \ 
1975 count / 


100 . 
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and 1970 Population 

Total 

More 

-9.9 

+9.9 

+ 24.9 

+49.9 

More 

Less than 500 population 

223 

50 

25 

42 

53 

25 

28 

—25.0 percent or more 

31 

2 

1 

2 

8 

4 

14 

— 15.0 to —24.9 percent 

28 

2 

I 

4 

6 

10 

5 

— 10.0 to —14.9 percent 

19 

1 

2 

6 

7 

1 

2 

— 5.0 to —9.9 percent 

20 

2 

0 

2 

8 

4 

4 

— 0.0 to —4.9 percent 

17 

0 

3 

3 

8 

3 

0 

Exact 0 

10 

2 

1 

3 

2 

0 

2 

+0.0 to +4.9 percent 

16 

3 

6 

3 

4 

0 

0 

+5.0 to +9.9 percent 

12 

1 

3 

5 

3 

0 

0 

+ 10.0 to +14.9 percent 

12 

5 

2 

3 

2 

0 

0 

+ 15.0 to +24.9 percent 

21 

8 

2 

5 

3 

2 

1 

+ 25.0 percent or more 

37 

24 

4 

6 

2 

1 

0 

500 to 2,499 population 

195 

2 

24 

50 

61 

29 

29 

—25.0 percent or more 

12 

0 

0 

0 

0 

1 

11 

— 15.0 to —24.9 percent 

17 

0 

0 

1 

5 

4 

7 

— 10.0 to —14.9 percent 

16 

0 

1 

2 

8 

5 

0 

—5.0 to —9.9 percent 

44 

0 

2 

7 

20 

10 

5 

—0.0 to —4.9 percent 

30 

0 

3 

13 

10 

2 

2 

Exact 0 

1 

0 

0 

1 

0 

0 

0 

+0.0 to +4.9 percent 

34 

0 

6 

13 

8 

4 

3 

+5.0 to +9.9 percent 

22 

1 

7 

7 

6 

0 

1 

+ 10.0 to +14.9 percent 

10 

0 

2 

3 

3 

2 

0 

+ 15.0 to +24.9 percent 

6 

0 

2 

3 

1 

0 

0 

+25.0 percent or more 

3 

1 

1 

0 

0 

1 

0 

2,500 to 9,999 population 

199 

1 

29 

72 

57 

27 

13 

— 25.0 percent or more 

4 

0 

0 

0 

0 

0 

4 

— 15.0 to —24.9 percent 

11 

0 

0 

0 

2 

5 

4 

— 10.0 to —14.9 percent 

7 

0 

0 

0 

2 

5 

0 

—5.0 to —9.9 percent 

37 

0 

0 

15 

12 

8 

2 

— 0.0 to —4.9 percent 

57 

0 

7 

25 

20 

4 

1 

Exact 0 

0 

0 

0 

0 

0 

0 

0 

+0.0 to +4.9 percent 

46 

0 

12 

17 

16 

1 

0 

+5.0 to +9.9 percent 

13 

0 

3 

7 

2 

1 

0 

+ 10.0 to +14.9 percent 

7 

1 

2 

4 

0 

0 

0 

+ 15.0 to +24.9 percent 

12 

0 

4 

3 

2 

3 

0 

+25.0 percent or more 

5 

0 

1 

1 

1 

0 

2 



and 1970 Population 

Total 

More 

-9.9 

+9.9 

+24.9 

+ 49.9 

More 

10,000 to 24,999 population 

94 

2 

16 

25 

32 

15 

4 

—25.0 percent or more 

2 

0 

0 

0 

0 

0 

2 

— 15.0 to —24.9 percent 

1 

0 

1 

0 

0 

0 

0 

— 10.0 to —14.9 percent 

1 

0 

0 

0 

0 

1 

0 

—5.0 to —9.9 percent 

9 

0 

0 

2 

2 

5 

0 

—0.0 to —4.9 percent 

39 

1 

7 

10 

14 

5 

2 

Exact 0 

0 

0 

0 

0 

0 

0 

0 

+ 0.0 to +4.9 percent 

27 

0 

4 

11 

9 

3 

0 

+5.0 to +9.9 percent 

10 

1 

4 

1 

4 

0 

0 

+ 10.0 to + 14.9 percent 

1 

0 

0 

0 

0 

1 

0 

+ 15.0 to +24.9 percent 

3 

0 

0 

1 

2 

0 

0 

+ 25.0 percent or more 

1 

0 

0 

0 

1 

0 

0 

25,000 or more population 

88 

2 

21 

29 

25 

9 

2 

— 25.0 percent or more 

0 

0 

0 

0 

0 

0 

0 

— 15.0 to —24.9 percent 

0 

0 

0 

0 

0 

0 

0 

— 10.0 to — 14.9 percent 

0 

0 

0 

0 

0 

0 

0 

— 5.0 to —9.9 percent 

6 

0 

0 

1 

2 

1 

2 

— 0.0 to —4.9 percent 

36 

1 

4 

12 

15 

4 

0 

Exact 0 

0 

0 

0 

0 

0 

0 

0 

+ 0.0 to +4.9 percent 

40 

0 

15 

15 

7 

3 

0 

+5.0 to +9.9 percent 

3 

1 

1 

0 

0 

1 

0 

+ 10.0 to +14.9 percent 

2 

0 

1 

0 

1 

0 

0 

+ 15.0 to +24.9 percent 

1 

0 

0 

1 

0 

0 

0 

+ 25.0 percent or more 

0 

0 

0 

0 

0 

0 

0 


“Percent difference for each area equals postcensal estimate as of July 1 minus adjusted 
special census count (interpolated or extrapolated to July 1), expressed as percent of adjusted 
census count. 


source: Unpublished data from the Bureau of the Census provided by Frederick 
Cavanaugh. 


The above calculations were made by assuming that the percent change in 
population for all subcounty areas in each size-percent change subgroup 
in Table 2.8 was exactly the midpoint of the percent change interval; it 
was also assumed that all areas that declined in population by 10 percent 
or more had declined by exactly 15 percent and that all areas that in- 
creased in population by 50 percent or more increased by exactly 75 per- 
cent. For example, the percent difference based on change in population 





(5.0) 


5.6 

/1. 00 


5.6 


1.05 

The patterns of error in the illustrative calculations are str 
ferent from those in Table 2.8. Subcounty areas subject to littl 
decline have the largest percent differences based on change 
tion, whereas the fast-growing areas have much smaller p 
ferences. From this perspective the greater accuracy docu 
Tables 2.7-2.11 for areas of slow or moderate change in populi 
explained by the fact that their change in population from 1^ 
was a smaller proportion of their total population in 1975 th 
case for areas undergoing more rapid rates of growth or decli 

It is also worth noting how very large the relative errors are 
are based on change in population rather than on total popu 
example, the rather moderate average percent difference of 5.. 
total population) for areas with 2,500 to 9,999 population (in 
grew by less than 10 percent between 1970 and 1975 represents 
difference of 1 1 1 percent when the error is measured in terms c 
population size (and when it is assumed that all areas increase 
5 percent). Similarly, the average percent difference of V. 
(based on total population) for areas of the same size that grew 
percent between 1970 and 1975 represents an average different 
cent when the error is measured in terms of change in 
Although these illustrative calculations are not based on real 
dividual subcounty areas, they are true measures of the j 
ferences based on change in population under the stated £ 
namely, that all subcounty areas in each subgroup of areas i: 
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ad the same percent change in population, the midpoint of the percent 
lange interval. 

We have made some real calculations for one population-size group of 
ibcounty areas. Percent differences between estimated and enumerated 
range in population for 118 subcounty areas that had 1,000 to 2,499 
opulation in 1970 are reported in Table 2.12. The pattern of differences 
ased on change in population is quite similar to the illustrative calcula- 
ons derived above for areas with 500 to 2,499 population. The more 
stalled classification by percent change in population for 1970-1975 
line intervals instead of six) provides us with separate measures for areas 
ith little change in population (less than 5 percent increase or decrease), 
he average percent differences (based on change in population) for these 
VO groups of areas were exceedingly high — 247 and 331 percent — largely 
Bcause of the very small base of the percent differences for individual 
reas in these two groups. The average percent differences for areas that 
icreased or decreased by 5 to 10 percent during 1970-1975 were also very 
igh— 100 percent or more. Although the average error decreased as the 
ite of population growth increased, it remained as high as 49 percent for 
reas that grew by 25 percent or more between 1970 and 1975. 

In evaluating the accuracy of estimates of postcensal change in popula- 
on, it is also important to take into account whether the estimated 
lange is in the correct direction, that is, does it correctly estimate 
hether the population increased or decreased. The average percent dif- 
;rences in Table 2.12 are averages of unsigned percent differences. Thus 
n average (based on change in population) that exceeds 100 percent in- 
icates that the error in the estimate of change was larger (on the average) 
lan the enumerated change in population: the estimate of change either 
as in the wrong direction or overestimated the magnitude of change by 
lore than 100 percent. The third column of Table 2.12 reports the 
umber of subcounty areas for which the percent difference based on 
tiange in population exceeded 100 percent, and the fourth column in- 
icates the number of areas in which the estimated change in population 
as in the wrong direction (increase instead of decrease or vice versa). In 
0 of the 118 areas the estimated change was in the wrong direction. In 10 
reas that actually decreased in population according to special census 
Dunts, the estimates showed an increase in population; most (8 of 10) of 
lese areas had declined in population by less than 5 percent, but the esti- 
lated increase for 3 of these 8 areas exceeded 5 percent. Similarly, in 
nother 10 areas that actually increased in population, the estimates 


ference Greater 
Than 100.0* 


Percent Change 
in Population, 
1970-1975 

Average Per- 
cent Differ- 
ence Based 
on Change" 

Total 
Number 
of Areas 

Total 

Estimate 

of 

Change 
in Wrong 
Direction 

Average Per- 
cent Differ- 
ence Based 
on TotaF 

— 10.0 or more 

140 

2 

1 

1 

19.9 

-5.0 to -9.9 

114 

4 

2 

1 

7.7 

1 

0 

q 

d 

1 

331 

11 

8 

8 

4.7 

+0.0 to +4.9 

247 

13 

6 

2 

3.4 

+5.0 to +9.9 

100 

13 

4 

2 

6.4 

+ 10.0 to +14.9 

64 

17 

3 

2 

7.2 

+ 15.0 to +24.9 

63 

19 

4 

4 

10.7 

+25.0 to +49.9 

49 

20 

1 

0 

12.2 

+50.0 or more 

49 

19 

0 

0 

20.8 

TOTAL 

111 

118 

29 

20 

10.3 

Total (excluding 
-4.9 to +4.9 
percent change) 

87 

103 

14 

10 

10.9 


“Percent difference for each area equals estimated change in population, 1970-1975 (1975 
estimate minus 1970 population) minus enumerated change in population (1975 adjusted 
census count minus 1970 population), expressed as percent of enumerated change in popula- 
tion. Average calculated as arithmetic mean of percent differences disregarding sign. 

* Refers to percent difference based on change in population. 

“Percent difference for each area equals 1975 postcensal estimate minus adjusted special 
census count (interpolated or extrapolated to July 1), expressed as percent of adjusted census 
count. Average calculated as arithmetic mean of percent differences disregarding sign. 

source; Unpublished data from the Bureau of the Census provided by Frederick 
Cavanaugh. 
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decreases of 1 to 9 percent. In addition to the 20 areas for which the 
estimated change was in the wrong direction, there were 9 subcounty areas 
for which the change in population was overestimated by more than 100 
percent (8 were overestimates of population increase, and 1 was an 
overestimate of population decline). 

These individual calculations for 118 subcounty areas, together with the 
illustrative calculations reported earlier, raise questions about the advisa- 
bility of attempting to update population data for purposes of allocating 
funds to small areas. It seems quite possible that the estimated postcensal 
change in population for a substantial proportion of areas below some as 
yet unspecified threshold may be in the wrong direction or may have 
average errors in excess of 100 percent. This possibility should be carefully 
checked in tests of the estimation methods against the 1980 census results. 

It is also probable that factors other than population size and rate of 
population change — for example, age structure of the population— affect 
the accuracy of the subcounty estimates. These relationships should be 
further explored when the 1980 census results are available. 


2.3 PANEL EVALUATION OF PER CAPITA INCOME 
ESTIMATES 

Tests of per capita income estimates are performed with two considera- 
tions in mind: (1) accuracy of the per capita income estimates as used for 
GRS and (2) accuracy of the estimates of postcensal change in per capita 
income. The second consideration is relevant for evaluating the estimation 
methodology for postcensal per capita income. The basis of our evaluation 
is a sample of 86 special censuses, taken in 1973 at the Census Bureau’s 
expense, in which income questions were asked of the entire enumerated 
populations. 

Although the same tests are performed to evaluate accuracy for points 1 
and 2, the test results are interpreted differently. As was noted above, the 
postcensal estimate of per capita income level for an area equals the sum 
of the 1970 census estimate and the estimate of postcensal change. Since 
the 1970 census estimates are based on 20-percent samples of respon- 
dents, they are subject to sampling error. Thus the estimates of postcensal 
level contain error from the estimation of change, and they also contain 
sampling error from the 1970 estimates. The effect of the latter error 
needs to be eliminated when one makes inferences about point 2. 

Concern over accuracy of grs allocations leads us to focus on accurate 
estimation of the ratio of subcounty (or county) per capita income to 



our attempts at evaluation of accuracy for points 1 or 2. Althoi 
data are not based on sampling but on attempted complete enur 
nonresponse to questions on income and biased response (e.g 
reporting of income) both introduce error. Previous studies have i 
that income was underreported in the 1970 census by about 8 pe 
the nation as a whole (Ono, 1972). The underreporting varied sigi 
by type of income (wage and salary income, farm income, etc.), 
ferential errors among places were substantial. Underrepor 
nonresponse undoubtedly also mar the special census figures, bi 
only guess at the extent.® 

Thus the special census data on per capita income are inac 
some unknown degree, and the difference between the speci< 
figure and the postcensal estimate of per capita income for a p 
not be caused entirely by error in the latter. Alternatively, the erp 
two figures can conceivably offset each other, so that their differ( 
on occasion underestimate the error in the postcensal estimate. 

Our comparison of the results of the 1973 special censuses to 
censal estimates for the same date is shown in Table 2.13. The se 
umn shows the percentage of areas for which the postcensal esti 
closer to the special census figure than was the 1970 census ] 
should be noted — the third and fourth columns — that the postce 
mates of level for the smallest places (1970 population under 1 ,00' 
much better estimates of level than are the 3-year-old 197 
estimates, despite inflation. Since inflation causes the per capit 
levels to rise more or less uniformly for most places, muc 
discrepancy between the 1970 census and 1973 special censu 

The denominators given here are approximate; see Appendix E for the actual, 
plicated expressions. 

For the special censuses, nonrespondents were assumed to have the same 
respondents. For the 1970 census, more sophisticated imputation techniques wer 
and Herriot (1979) suggest that those techniques cause a relative downward 
special census estimates. 


lAUi/ii z.ij v^ompanson or ;bpeciai i-ensus rer capita income 
Estimates With 1970 Estimates and Postcensal Estimates: Original 
Methodology 


1970 Population 
of Places 

Number 

of 

Places 

Percent of Areas for Which 
1973 Postcensal Estimate 
Closer Than 1970 Census 
to 1973 Special Census 

Average Percent 
Difference From 

1973 Special Census" 

1970 

Census 

1973 

Estimate 

Under 500 

16 

62 

25 

28 

500-999 

11 

45 

15 

17 

1,000-4,999 

46 

65 

15 

10 

5,000-9,999 

9 

89 

15 

8 

10,000-20,000 

4 

100 

21 

14 

TOTAL 

86 

66 

17 

15 

Total above 500 

70 

67 

15 

11 

Total above 1,000 

59 

71 

15 

10 


‘'Percent difference for each place equals postcensal estimate minus special count, expressed 
as percent of census count. Average percent difference calculated as arithmetic mean of per- 
cent differences disregarding sign. 


source: Unpublished data from the Bureau of the Census provided by Roger Herriot. 


would disappear if ratios of income levels were the focus of comparison. 
We note, for example, that for only 9 of the 86 areas did the 1973 census 
per capita income figure fall below the 1970 census figure. We suspect 
that the postcensal estimates would look even worse if we could similarly 
compare the ratios of place per capita income to county per capita in- 
come. As we mentioned above, for general revenue sharing, ratios rather 
than levels of per capita income are relevant (see Appendix E). 

The methodology underlying the postcensal per capita income estimates 
analyzed in Table 2.13 was modified later in the 1970s. The Census 
Bureau originally estimated per capita income for subcounty units with 
1970 population below 500 by the estimated per capita income for the 
whole county; for subcounty units with 1970 population of 500 to 999 the 
Bureau estimated per capita income by attributing to these units the 
estimated rate of change for the aggregate of all areas in the county with 
under 10,000 population in 1970. Beginning with the estimates of per 
capita income for 1974, empirical Bayes techniques and other modifica- 
tions were used to revise the procedures for these very small places 
(population under 1,000), Fay and Herriot (1979, Table 3) recomputed 






to the special census was unchanged. The revised methodology appar 
improves the accuracy of the postcensal estimates. Other tests 
“Groups of Ten Test” discussed by Fay and Herriot (1979) (section 4 
the Bureau of the Census (1980)) show that the revised methodolog; 
proves the 1970 base estimates as well. Nevertheless, we conclude th; 
GRS purposes the use of postcensal estimates of per capita income fc 
smallest places may not be substantially more accurate than the u 
1970 census estimates (especially if the latter are adjusted by emp 
Bayes techniques), although this conclusion might not hold for 1( 
time periods. 

In evaluating the estimates of postcensal change in per capita inc 
as was mentioned above, the comparisons of postcensal estimal 
change with censal estimates of change are confounded by undern 
ing errors and nonresponse errors. It is sometimes hypothesized th 
errors arising from underreporting of income are stable over time si 
these errors cancel when one considers changes in the estimates over 
For example, if underreporting caused per capita income for an area 
underestimated by $200 both by the 1970 census and by the 1973 s 
census, then the errors cancel, and the difference between thi 
estimates accurately measures the true change in per capita incon 
this case the difference between the 1970 census and 1973 special c 
estimates would be a good standard for assessing the accuracy of tl 
dates. However, such neat cancellation of underreporting and i 
sponse errors may be more hoped for than real (see Appendix I). 

In addition to underreporting and nonresponse errors, sampling 
contributes to the inaccuracy of the difference between 1973 specia 
sus and 1970 census estimates as an estimate of postcensal change 


'^Fay and Herriot (1979, Table 3) classify areas by the 1970 census weighted sample 
tion rather than the 1970 census count, as we do. The classifications are the samf 
areas except Bonaparte, Iowa, which had a 1970 population of 517 but a weighted 
population under 500. 

’'^That is, comparison of the difference of the postcensal estimate minus the 1970 
estimate with the difference of the 1973 special census estimate minus the 1970 
estimate. 
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capita income." Unpublished calculations by the Census Bureau indicate 
that the coefficient of variation due to sampling for an area was approx- 
imately 3.0 divided by the square root of the size of the area’s 1970 
population. For small areas the coefficient of variation is large: .09 for an 
area with 1,000 population and .30 for an area with 100. The presence of 
these errors implies that one cannot simply interpret deviations between 
the “census change” (the difference between the 1973 special census 
estimate and the 1970 census estimate) and the estimated change (the dif- 
ference between the postcensal estimate and the 1970 census estimate) as 
evidence of error in the estimate of postcensal change. To estimate this er- 
ror, it would be necessary to separate out the other components of error — 
nonresponse, underreporting, and sampling error in the census 
estimates.'^ We do not undertake this task here, but possible approaches 
are noted in section 3.3. 


''Although the 1970 census estimate enters into both of the quantities being compared, the 
difference between the postcensal estimate of level and the 1970 census estimate is the actual 
estimate of change, but the difference between the 1973 special census estimate and the 1970 
census estimate is an inaccurate estimate of true change because of sampling error in the 
1970 census estimates. 

'^An extensive bibliography on error components is given by Sahai (1979). 



3.1 LOSS FUNCTIONS AND OPTIMIZATION CRITER 


Chapter 1 (section l.ld) discussed four criteria of accuracy along wit 
likelihood of conflicts among them. The four criteria are (1) low ave 
error, (2) low average relative error, (3) few extreme relative errors, 
(4) absence of bias for subgroups. 

An explicit, concise, and useful way to summarize weightings o 
curacy criteria is to formulate loss functions or optimization crit 
These devices are designed so that choosing a procedure to minimize i 
corresponds to selecting an estimation procedure best satisfying th 
curacy criteria and the preferred trade-offs among them. For examj 
familiar optimization criterion for estimating a single parameter is i 
square error; one chooses the estimator with the smallest mean squai 
ror. 

Before giving illustrations of loss functions and optimization cri1 
we note two applications for small-area estimation. One application 
selecting the “best” from a class of alternative estimators for which 
are available. Consider, for example, choosing among weighted ave 
of two estimators, one having low average relative error but some ext 
errors and the other with higher average relative error but no extren 
rors. One can choose the weightings in the average so as to minim 
specified optimization criterion. This is a familiar statistical proble 

A second application relates to collecting data and designing estim 
for which the data must be gathered. This use of optimization crite 
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necessary data are essentially different from the costs of sample data. For 
instance, sample estimates with given coefficients of variation for small 
and large places may cost approximately as much for small places as for 
large places. But ar method estimates, based on tax returns, may be more 
expensive for small places than for large places if coefficients of variation 
are required to be equal. The extra cost for small places arises because 
boundary changes and geographic coding problems are generally more 
important for small places than for large places. 

Let Pi and P/ denote the actuaP and estimated population of the /th 
local area, for a total of n local areas. Consider the second criterion, low 
average relative error. Attaining that criterion is equivalent to minimizing 

ElA-AI/A (3.1) 

i 


To reflect the third accuracy criterion, few extreme errors (or low variation 
in error), optimization criterion (3.1) can be modified to 

EdA-AI/A)” (3.2) 

i 


where a is a number larger than 

Large values of a reflect a desire to reduce extreme errors. For example, 
consider choosing between X and Y, two alternative sets of estimates for n 
places of approximately the same size. Suppose X had equal relative ab- 
solute errors of .04 for all places and T had a relative error of .20 for 1 per- 
cent of the places and .01 for 99 percent of the places. In this case the 
average relative error for Y (.0119) is less than the average relative error 
for A (.04). But if criterion 3 is used with a greater than or equal to 2.85, 
set X will be selected. As a grows large without bound, minimization of 

' In practice, when one is estimating the value of (3.1) below, the value of P,- is not known but 
is estimated, often on the basis of a census or survey; care is needed to adjust for error in this 
estimate (see Appendix I). 

^Since the errors are random, minimization of (3.1) or (3.2) refers to minimization of the ex- 
pected value of (3.1) or (3.2) or of some strictly increasing transformation of (3.1). For exam- 
ple, if a = 2 in (3.2), one might minimize the expectation of the square root of (3.2), often 
referred to as the root-mean-square error. 


tions IS a vague statement that allows several interpretations, ror exi 
pie, one could seek to minimize the total number of dollars misalloci 
(i.e., allocated to the wrong area). For the /th local area, let Ai anc 
denote the estimated allocation and the targeted allocation if there an 
errors in the data. One seeks to minimize the expectation of 

Z\A,-A,\. ( 

t 


Note that .A,- and A,- involve and Pi, respectively, in an implicit fash 
so that (3.3) is a complicated expression oi P\, • • - , Pn ^nd P^, . . . , 
One may not be able to specify exactly how to choose the estimators 
. . . , to minimize (3.3); in fact, one may not even be able to write 
(3.3) explicitly in terms of Pj, . . P„ and Pj, . . Pn, but approx 
tions are possible (see Appendix E). For illustrative purposes we 
simplify greatly and assume that A, and A,- are proportional to the fra( 
of total population living in the /th area; that is, we assume that 

Ai = cP/iLPj) Ai = cPi/OlPj), 

J J 

for some positive constant c not depending on i. In this special cas 
may rewrite (3.3) as 


cZ\Pi/i:Pj-Pi/ZP;\. 

i J J 

Note that uniform relative errors are irrelevant; for example, if (P,- ~ 
P,- is the same for all areas i, then (3.4) is zero. Simplifying even fur 
let us assume that EPj estimates LPj with negligible error, so that (3.3 
(3.4) can now be expressed as 

i 

where G is some positive constant of proportionality. Ignoring the 
stant of proportionality, we notice that (3.1), (3.2), and (3.5) are sj 
cases of the general optimization criterion 


of minimal errors in allocation of funds. What are appropriate values for a 
and q is largely a policy question and not a technical question. 

The following example illustrates the implications of different values of 
a and q for selecting estimators for local areas. For clarity of presentation 
we consider the substate jurisdictions of the United States partitioned into 
two groups on the basis of 1970 population counts: those with at least 
10,000 inhabitants will be called “large,” and the rest “small.” Also for 
simplicity we assume that all large places have identical population sizes 
and that all small places have identical population sizes, and we consider 
selecting an estimator for the two population sizes. Suppose we have three 
estimators, E^, and £" 3 , which provide unbiased estimates of popula- 
tion with the following coefficients of variation:^' 


E\ El E 3 


Small places .100 .085 .075 

Large places .040 .045 .050 

The estimators represent different trade-offs between error for large 
places and error for small places. For large places, E^ is best, then E 2 , and 
last £3. For small places the situation is reversed: £3 is best, £3 second 
best, and£j worst. Which estimator is best overall? 

As a rough approximation to reality, say there are 32,500 small areas, 
each with a population of 1,500 and 3,000 large areas, each with a popula- 


^ Further generalizations are possible, of course: for example, in (3.6) could be replaced 
by a more general term W,-. Alternative formulas are also possible; see Stanford Research In- 
stitute (1974) or Ferreira (1978). 

‘'The coefficient of variation of an estimate is its standard deviation expressed as a propor- 
tion of the quantity (here, population) being estimated. We are also assuming here that the 
expected absolute values of the relative errors are proportional to the coefficients of variation 
(as is the case when the values of the estimators follow the normal distribution). 



(C) E(P, - P,)VP,., 

i 

(D) Z\Pi-P,\/Pi. 

i 

Criterion D requires minimizing average relative absolute error, wl 
criterion B requires minimizing the total number of dollars misalloca 
(assuming allocations are proportional to population). Criterion A i 
variant of B and is less concerned with small individual misallocations £ 
more concerned with large individual misallocations. Criterion 
represents a compromise between A and D. Each of these criteria imp 
a different ranking of the three estimators in order of preference:^ 

criterion A : £■! , £"2 , £’3 ; 
criterion B : £2 > » ^3 5 

criterion C : £2 , £3 , £1 ; 
criterion D ; £3 , £2 , £, . 

Clearly, the different criteria have different implications for “best” 
criterion A, estimator £j is best; by both criteria B and C, estimator i 

®The numerical values of criteria A-D are given by the following tabulation, for estim 
£‘i,£’ 2, and £3, where a = 10^, b = (lO^)s, c = 10^, andcf = (lO^)s. The constant s i 
ratio of the expected absolute value of the relative error to the coefficient of variation: f( 
rors following the normal distribution, s is approximately .8: 


E\ El £3 


Criterion A 

10.5a 

12.8a 

15.7a 

Criterion B 

10.3b 

10.26 

10.66 

Criterion C 

7.0c 

6.3c 

6.5c 

Criterion D 

3.4rf 

2.9d 

2.8d 


It should be noted that comparisons between numerical values under different criteri 
not meaningful. For example, if all the values for criterion C were multiplied by 10^, it \ 
not affect the preference ordering represented by criterion C, but the values would be I 
than any others in the tabulation. 



example, Dotfi criteria a ana ^ lavor an increase or .UUo relative absolute 
error for large places to get a decrease of .015 relative absolute error for 
small places (£"2 is better than £ 1 ); but criterion C favors and criterion B 
does not favor an increase of .010 relative error for large places to get a 
reduction of .025 for small places. 

We emphasize that the optimization criteria are meaningful only inso- 
far as they represent the desires of the producer of the estimates (the Cen- 
sus Bureau in this case) for different kinds of accuracy. The above illustra- 
tion demonstrates that tractable formulations of optimization criteria can 
be useful for representing preferences for trade-offs in accuracy. Once 
preferences are stated, a representative optimization criterion can be 
determined and used for selecting estimators with the desired properties. 

Note that the optimization criteria A-D in the example above disregard 
the issue of bias for subgroups. One way to incorporate concerns about 
bias into the optimization criteria is to use constraints: only estimators 
with specified unbiasedness properties may be used. Choice of the “best” 
estimator within the class of acceptably unbiased ones is then made 
according to optimization criteria. 

Constraining or restricting the class of estimators under consideration is 
also a useful way to reflect other concerns. Criteria A-D are all related to 
aggregate error, but one might also be concerned that no component error 
be larger than a specified amount (or percent). A reasonable optimization 
strategy selects as best only an estimator whose component errors lie 
within preestablished limits. For example, the consideration of estimators 
might be restricted to those for which the expected relative absolute error 
for any place picked at random is less than 0.4. The optimization criteria 
discussed earlier could then be used to select a best estimator from within 
this restricted class. 

This kind of approach is advocated by Office of Federal Statistical 
Policy and Standards (1978), which recommends (p. 27):^ 

That since data errors are inevitable and since statistical resources are limited, 
priority be given to minimizing the very large errors which may occur in data 
used for the allocation of funds. ... To the extent that error measurements 
are available for small geographic areas one should check that relative errors 
are no greater than a prespecified maximum, but one should not be overcon- 
cerned with small errors since their effect on the total distribution is relatively 
minor. 


^Note the distinction between errors and relative errors. 
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the various estimates (and combinations thereof) for the same areas and 
the most accurate procedure selected. The Census Bureau did use this ap- 
proach in 1973 when they took special censuses in a probability sample of 
86 areas to evaluate the accuracy of the administrative records method for 
small subcounty areas. The problem with this procedure is that the sam- 
ple of local censuses is prohibitively expensive to take for a sufficiently 
large number of areas to provide a definitive evaluation. 

The second approach is to use an existing large, high-quality sample 
survey. The high quality is necessary to ensure that sample estimates are 
unbiased, and the large size is necessary so that sample estimates for 
selected primary sampling units (psu’s) can be computed.^ Fortunately, 
such a sample exists for the Current Population Survey (cps). It takes 
complete enumerations in 70,000 households each month. Each psu for 
the CPS consists of an independent city or county or two or more con- 
tiguous counties. The sample estimates computed for these psu’s have 
been found to be unbiased, and when the estimates are compared with 
1970 census counts, the mean relative squared deviation for the psu 
estimates (based on data pooled from five consecutive quarterly surveys) 
was less than .025 (Ericksen, 1975). 

These psu estimates can be taken as dependent variables in regression 
equations using as independent variables both symptomatic information 
usually used in population estimates and even alternative population 
estimates themselves (e.g., cm ii estimates and ar method estimates). As 
long as the error of the sample estimates has no linear trend with relation 
to the rate of population growth, the resulting regression equation yields 
an unbiased estimate of what would be computed if population counts of 
the dependent variable replaced the sample estimates. The one noticeable 
point of difference is in the size of the correlation coefficients: because a 
major component of the variance of the dependent variable is random 
sampling error, the observed correlations, but not regression coefficients, 
are shrunk. 

’Large can either mean high sampling rates for fewer places or low sampling rates for many 
places; see Appendix H. Unbiased is here taken to mean that the expected values of the 
sample-based estimates are the same as those of estimates based on a census: that is, under- 
count is ignored. 
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approach. More research is also needed to account for errors in bench- 
marks used for evaluation, whether these benchmarks are sample survey 
estimates, census estimates, or other kinds of estimates. 


3.3 ERROR MODELS 

Error in postcensal estimates of population and income can arise from 
numerous sources. Identifying the different components of error can be 
valuable for determining where improvement of data or methodology is 
most needed. In practice, estimates of biases and variances of some error 
components may be readily available, while only approximate bounds are 
obtainable for the moments of the remainder of the components. Models 
of error allow one to combine these different pieces of information to pro- 
duce estimates (possibly, interval estimates) of the total error. Further- 
more, construction of error models leads to insight into ways to improve 
estimation procedures. 

We focus attention here on error decompositions for population esti- 
mates provided by linear models, as in the ratio-correlation method or the 
regression-sample method (Ericksen, 1974; Fay, 1979; Gonzalez and 
Hoza, 1978).'^ In Appendix G, alternative error models are presented for 
ratio-correlation estimates. Error models can be constructed in diverse 
ways, and the particular structure should be chosen to conform both to 
knowledge about components of error and to desired insights. For exam- 
ple, Appendix G uses error models to analyze the effects of undercount on 
the postcensal estimates of population obtained under several methods. 

We begin by making a simplifying assumption about the forms of the 
variables in the linear models. The ratio-correlation method uses variables 
V in the form 


V,(t)/V/(Q) ^ 

V+(0/V+(0), 

where t refers to the current period, 0 refers to the previous censal period, 

“^The following discussion is technical and is mainly for readers familiar with regression 
theory. 
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size). 

The regression-sample method estimates Yf by X,^,. The difference 
between the best linear fit AT,/?, and the fit estimated with cps data is 

X,(3,-X,^,. (3.9) 

We call (3.9) the error due to data error. This error component can be fur- 
ther decomposed into four data components: error from bias in the cps 
estimates, error from random variation in the cps estimates, error due to 
differences between the characteristics of the psu’s and the units of 
analysis (states and counties), and error due to the wrong choice of 
weights in Wf . 

The ratio-correlation method uses the same regression model to make 
postcensal estimates for all t until the next census. These estimates have 
the form Xf0Q, where /3o, the estimated vector of k regression coefficients, 
is determined on the basis of data from the time t = 0 (the previous census 
year) and X^ is as defined above. If we denote by Yq the vector of census 
counts for the n populations and if the optimization criterion is weighted 
least squares, then /3o is given by 


0o = (Xo'WoXo)-^Xo'WoYo, 


where Wq is a matrix {n X n) of weights (and we assume that Wo is non- 
singular and Xq has rank k). The weights Wq correspond more closely to 
Wi than to W,, since there is no sampling variance in Yq to be adjusted 
for. Generally, as with W, in (3.8), the matrix Wq is chosen to be the iden- 
tity, and thus unweighted regression is used. 

The quantity 


X,^,-X,&Q 


(3.10) 


measures the difference between the predictions under the best linear fit 


{Y, - X,/3,) + - XM 

the sum of error due to the model and of error due to data, where the lat- 
ter term decomposes into four components (as is discussed above). For the 
ratio-correlation estimates, the error F, — F, equals 

(F, - X,I3,) + {X,I3, ~ XM> 

the sum of error due to the model and of error due to structural change in 
regression. 

The individual error components can be isolated, and their properties 
(mean, variance, etc.) estimated. Fay (1979) uses interview data for the 
Survey on Income and Education (sie) to study separately the error in the 
model for regression-based estimates as well as the error from bias and the 
error from random variation of the sie estimates of the number of 
children in poverty. Fay also considers the error component arising from 
structural changes in regression for the problem of postcensal population 
estimation. He computes the principal components of various symp- 
tomatic indicators of population (school enrollment, labor force, and tax 
returns) and compares them in different years. This permits analysis of 
whether prediction models change over time because of changes in the in- 
terrelationships of the symptomatic variables apart from changes in their 
relationship to the dependent variable (population). 

To summarize, the error components in linear models appear to be 
estimable, although much work remains to be done to develop additional 
theory and methods. We think that further development of methods and 
appropriate data collection, where practicable, will greatly enhance the 
ability of the Census Bureau to produce more accurate estimates and to 
understand the structure of errors. The Panel encourages the Census 
Bureau to undertake such efforts. 



Testing Estimates 
Against the 
1980 Census 


4.1 POPULATION 

The 1980 census presents the first opportunity to extensively test the 
postcensal estimation methods used in the 1970s. While there are prob- 
lems with using a decennial census to evaluate estimates (see section 3.1 
and Appendix I), such tests are still the most powerful tool for evaluating 
the accuracy of estimation methods. The first step in performing tests is 
to state and justify clearly the evaluation criteria to be used (see sections 
1 .Id and 3.1). 

Several basic questions are of interest: 

1. How accurate are the estimates of total population and per capita 
income for different geographic levels (states, countries, subcounty 
areas)? 

2. How does accuracy vary with characteristics of an area, such as 
population size, rate of population growth, and age distribution of the 
area’s population? 

3. How does accuracy vary with time? 

4. What other characteristics are associated with the accuracy of the 
population estimates? 

5. Are the estimates biased for certain classes of units? 

6. How do the current methods compare in accuracy? 

7. How do alternative methods compare? What effects on accuracy 
would result from modifications to the methods? 
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4.1a CENSUS BUREAU PLANS 

The Census Bureau has prepared an outline of possible 
methods against the 1980 census. ‘ The outline is extensive, a 
summary follows. Questions 1, 2, 5, and 6 above are conside 
summary tabulations similar in form to Tables 2.1-2.11. O 
tions can also be done to study separately the accuracy of tl 
for areas undergoing boundary changes and annexations. Q 
considered by focusing on areas that received special cens 
the 1970s and comparing the deviations of the estimates fron 
censuses with the deviations of the estimates for April 1, 19^ 
1980 census counts. 

A variety of alternative methods are mentioned in the outli 
dates for testing; these include methods described in the li 
not now used by the Bureau, as well as new methodology beir 
at the Bureau. In particular, at the subcounty level the 1 
method may be tested for the approximately 16,000 subcoui 
which the requisite data are available. 

To consider question 8, the Bureau may recompute estimc 
(where possible) census data in place of administrative dal 
paring deviations of these estimates with the deviations of 
estimates from the 1980 census. 

A variety of possible modifications to methods are pres 
outline. For county estimates these include use of optimal 
alternate ways of performing the “rake/float” adjustments ( 
Appendix A, section 4.2). For the ar method, possible mod 
elude adjusting the migration rate computations for diffei 
patterns by race and expanding the time intervals betwe 
years. For the subcounty estimates the possible modificati 
elude computation of the ar estimates separately for popul 

‘This outline is T80-CV, Explanatory Notes on “1980 Tests— Outline" an 
Outline, an unpublished working document of the Census Bureau. 


allow the Census bureau to conduct all the tests. This section presents 
the Panel’s comments and suggestions for the tests. 

The decomposition of total error into data error and error in the 
model (see question 8), as considered by the Bureau in the outline, is 
important. The prime candidate for such a decomposition is perhaps 
component method II. The likely approach would be to first estimate 
total error from the deviations of the 1980 cm ii estimates from the cen- 
sus results. Alternative estimates could also be prepared with the same 
methodology but using census data instead of symptomatic data. Under 
certain assumptions, error in the alternative estimates — estimable from 
the deviations of these estimates from the 1980 census results — may be 
thought of as error in the model. The difference between the total error 
and the error in the model is the data error. Such a decomposition will 
be helpful in determining how to improve the estimates. Work is needed, 
however, to try to estimate the effect of error in the 1970 and 1980 cen- 
sus data. In the description just presented, this error is incorporated in 
the component of error attributed to the model. 

A similar decomposition of error should be performed for the ratio- 
correlation method. The total error in the rc estimates may be de- 
composed into the error in the model and the error due to structural 
change in the regression (as described in section 3.3). Total error may be 
estimated from the deviations of the 1980 rc estimates from the census re- 
sults. An alternative regression equation can also be constructed from 
symptomatic data and census data for 1970 and 1980. The deviations of 
the estimates yielded by this alternative equation from the 1980 census 
results form the basis for estimating the error in the model. The differ- 
ence between the total error and the error in the rc model is the error 
due to structural change. 

Optimal weights for the averaging of estimates can improve the ac- 
curacy of estimates over that of equally weighted averages. (A method of 
determining optimal weights is discussed below.) Optimal weights for 
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:ome series of the Bureau of Economic Analysis (bea) for counties and 
dates. 

A test should be conducted for all counties and for a probability 
lampie of subcounty areas with tabulations classified by population size 
)f area, rate of change in population, and, for counties, proportion of 
jopulation that is rural (test 1, in Table 4.1). A second test should de- 
ermine the compatibility between the Census Bureau’s postcensal esti- 
nates as now computed and the personal income estimates compiled by 
JEA. The BEA data will have to be adjusted to make the income base 
:ompatible with the Census Bureau’s money income estimates (test 2, 
Fable 4.1). A third test would involve comparing the 1979 bea adjusted 
lata with the 1979 census pci. Tests should be conducted for large 


TABLE 4.1 Proposed Tests for pci Estimates 


Test Data Source 1979 pci Estimate 


1 

1979 postcensal pci 

1979 census pci 

2 

1979 postcensal pci 

bea„*/ 1980 population 

3 

1979 BEA,, 1979 BEA,,* 

1980 population’ 1980 population 

1979 census pci 

4 

1979 BEA,, 

X 1969 census pci 

1969 BEA,, 

1979 census pci 

5 

1979 BEA,, 

X 1969 census pci 

1969 BEA,, 

1979 census pci 

6 

regression pci estimates 

f 1979 census pci 

7 

regression bea,, /pop 

1.1979 postcensal pci 


* Adjusted to approximate total money income; uses bea,, for adjusted, bea,, for 
unadjusted. 




county’s bea personal income figure over the period 1969- 
this growth factor has been determined, it could be applied tc 
Bureau’s pci for 1969. By multiplying the 1969 pci against 
factor, an effort could be made to determine how closely th 
with the census pci for 1979 (test 4, Table 4.1). Again, this 
be designed for counties with a large proportion of farm ina 
counties for which far income is insignificant. Counties with 
ulation should be included in the tests and compared with c 
smaller populations. The accuracy of estimates for rapid 
counties should be compared with the accuracy for more stal 

The same tests could be repeated by using unadjusted bi 
by comparing it to the 1980 census estimates of 1979 pci (te 
Table 4.1). While there are reasons for adjusting bea data, 
rents and in-kind income may not be as significant as some t 
should be done to determine if the unadjusted data produc 
able an estimator as do the adjusted data. 

Other tests should be conducted to determine if it is fe< 
regression techniques for measuring income changes. Per cj 
could be regressed against such factors as the proportion 
come in wages and salaries, the proportion of total income 
come, the age structure of the population, employment in thi 
returns filed, and other economic variables (test 6, Table 4 
be feasible to construct an income estimating program si 
ratio- correlation estimating program for population for stat( 
units. Regression techniques might also be used for gen( 
censal estimates of per capita personal income for counties ( 
4.1). The regression estimates might be able to be produced 
than the usual personal income estimates. (It is doubtful thi 
sion technique could be extended to subcounty units.) Thes€ 
be conducted against both the 1979 census income estim 
postcensal 1979 pci estimates. 



Technical Critique 
of Methods 
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5.1 CRITIQUE OF INDIVIDUAL METHODS OF 
ESTIMATION 

In general, the individual methods used by the Census Bureau to esti- 
mate population— component method II (cm ii), ratio- correlation method 
(rc), administrative records method (ar) — appear sound. The method 
used to estimate per capita income is also sound. As was discussed 
earlier, however, the methods can produce highly inaccurate estimates 
for small areas, largely because of the lack of adequate data for produc- 
ing accurate estimates. The Panel does not know of any methods — short 
of prohibitively expensive sample surveys, annual censuses, or popula- 
tion register systems — that would yield significantly more accurate esti- 
mates on an annual basis. 

This chapter presents the Panel’s specific criticisms of the three 
methods for population estimation (cm ii, rc, and ar) and also of the 
method for income estimation. When it is possible, alternative proce- 
dures and tests are suggested. (The reader is referred to Appendix A for 
details on the population estimation methods and to Appendix B for a 
summary of the income estimation method. The reader may wish to re- 
fer to those appendices in following the details of the criticisms pre- 
sented here.) While our suggestions cannot solve the problem of inac- 
curate estimates for very small areas, they would lead to qualitative 
improvements that may prove to be significant for some areas. 

The Panel recognizes that many of our criticisms are minor and per- 



O.ia COMPONENT METHOD II 


Our critique of cm ii concerns the procedures for estimating i 
tion. In particular, our discussion focuses on proper denomi 
migration rates. The component method II estimates net migr: 
data on changes in school enrollments. We have two technical 
of this approach: tests are needed to determine whether the crii 
quantitatively significant. 

The migration “rates” used in cm ii for state and county 
are not consistent. The denominators for the school-age a 
migration rates for the period preceding the census (the preci 
ngqmigyrat(O) and sclmigrat(O) discussed in Appendix / 
2.2 and 3.4d) consist of the relevant population in the parti< 
tion enumerated in the census (that is, at the end of the pre 
gration interval). Use of such a denominator need not necess 
any problem. But the denominator of the postcensal school-; 
tion rate is the expected survivors (assuming births and dea 
migration) of school age in the particular location at the midp 
postcensal interval. Hence the denominators of the two measi 
conform. The denominators would be consistent if the actual 
population at the end of the postcensal period, not the expecte 
(assuming natural increase but no migration) in the middle of 
were employed. 

The postcensal migration rate R would then properly be a 
numerator is the number of surviving net migrants and who: 
nator is the number of inhabitants of the area at the end of i 
Hence if R were multiplied by an appropriate base, the resul 
the number of surviving migrants at the end of the period. £ 
needed for the migration component of cm ii is the total net 
migrants during the interval, regardless of whether the inn 
outmigrants survived, since the methodology assumes that 


Pi=Po + B-D+I~0, 
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where all events are charged against that area and where Pi is post- 
censal population, Pq is censal population, and B, D, I, and O refer to 
the total numbers of births, deaths, inmigrations, and outmigrations, 
respectively, occurring in the area during the period. * There does not 
appear to be an easy procedure for converting a migration rate pertain- 
ing to surviving net migrants into one measuring total net migrants. 
Although they are certainly not equivalent, an easy solution would be to 
assume that they were. In that case the proper base to multiply R by in 
order to estimate total net migration would be 

BASE = Pq + V2(J? — D) -f V 2 M, 

where M — I — O. Then since M equals R times base, 

BASE = [Po + V2(P - D)]/il - P/2). 

The difference between the estimates of migration using base as just 
defined^ and the base population currently used by the Census Bureau, 
Po + ViiB - D), is 


BASE-P2/2. (5.1) 

This difference is obviously nonnegative, regardless of the sign of R. 
Because the estimates of county (state) population are controlled to 
state (national) totals, the final effect of using base as a base population 
figure is not obvious. We suspect the effects will generally be minor, but 
in some cases they will not be negligible, especially when there is wide 
variation in P. A simple numerical example will illustrate this. Suppose 
one is estimating the population of 50 areas (e.g., states or counties in a 
state) and suppose that all areas have the same population Pq at the 
beginning of the estimation interval and also that there is a constant 
proportional natural increase, S (less than 10 percent).^ Consider three 
classes of areas for which the values of P are 0.15, 0.10, and 0.01, re- 
spectively, and let the numbers of areas in these classes be 3, 4, and 43. 
(This distribution of P over areas approximates the distribution of the 
5-year migration rates for 1970-1975 for states, as published by the 
Bureau of the Census (1976, Table 1) except that here we always take 

' Group quarters populations are ignored in this discussion. 

^Here we refer to the estimates before higher-level controls are applied. 

■’This last assumption has minor impact but simplifies the analysis; larger values and 
nonconstancy of S would not alter implications substantively. 
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5.1b ADMINISTRATIVE RECORDS METHOD 

The administrative records method (ar) is a component method of esti- 
mation that develops migration estimates on the basis of the numbers of 
tax returns matched across years according to social security numbers. 
(The method is described in detail in Appendix A.) Our principal cri- 
ticism of the ar method concerns estimation of net inmigration. The 
administrative records method develops migration estimates on the basis 
of changes in address on tax returns, and there is the possibility of 
biases arising from different filing rates for different segments of the 
population. The bias in the estimate may be severe for some areas, but 
there is insufficient evidence to draw a conclusion. Our other criticisms 
are qualitatively minor. We suspect they are not quantitatively signifi- 
cant, but only tests can determine if this is true. 


5.1b(l) Biases 

To calculate the net number of migrants to an area, the ar method 
multiplies the estimated migration rate by a base population figure. The 
net inmigration rate (irsrat) is estimated by 

INS — OUTS 

irsrat = ; » 

OUTS -1- NONMOV 

where ins, outs, and nonmov refer to numbers of exemptions on 
IRS tax returns matched by social security number across 2 years (see 
Appendix A, section 3.7 for precise definitions). This calculation ex- 
cludes those segments of the population not represented by exemptions 
on matched tax returns. Because the excluded populations often have 
different migration patterns than those covered by the tax returns, the 
estimates of net migration can be biased for many areas. Excluded 
populations tend to include, disproportionately, many aged, minority, 
and, possibly, low-income people.^ In a recent report (Bureau of the Cen- 
sus, 1978), David Word outlined a method for adjusting the migration 
estimates to remove the biases. The technique is summarized below; in 
the Panel’s view, the technique shows promise and should be evaluated 
when 1980 census results are available. 

This technique defines the coverage ratio as the ratio of exemptions 

^Exclusion of persons over 65 is not significant for ar estimates at the state and county 
levels, where Medicare data rather than tax data are the basis for the estimates. 
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contained on tax returns filed by persons under 65 years of age 
population under 65. The 1970 national coverage ratio for 
82.2 percent, and that for whites a surprising 101 percent (Bu 
Census, 1978). This anomaly for whites is explained by the ( 
multiple exemptions for individuals and by undercount in the 
sus.^ It does not imply that the entire white population was 
the tax system. Also important in this method is the match ra1 
as the ratio of the number of tax returns matched for the 2 j 
average of the number of returns filed in both years. Follow 
of the Census (1978), we define the efficiency ratio as the pro 
match ratio and the coverage ratio: efficiency ratio = mati 
coverage ratio. The efficiency ratio roughly indicates the fra( 
population for which we have data to estimate internal migraf 

Efficiency ratios vary significantly by age, race, sex, ai 
Error is introduced into irsrat because area migration rate 
by age, race, sex, and income. For example. Word estimat 
ciency ratios in Mississippi for 1970-1975 at 85.8 percent for 
42.7 percent for blacks. Assuming that the efficiency ratio fc 
did not vary by migration group (inmigrants, outmigrants, r 
Word calculated that the net migration for Mississippi for 
was estimated too high by 22,000 persons by the ar method 
percent of Mississippi’s 1970 population. 

It is suspected by some that for large cities the estimates i 
migrants into central cities tend to be young, nonwhite, and 
persons, and many of the inmigrants to the cities file tax ret 
first time after they migrate (see Mann, 1978). If this is tri 
efficiency ratio for inmigrants would tend to be lower than tl 
nonmovers or outmigrants from the large cities, so that net i 
large cities is underestimated. Lowe et al. (1974) compared 
mates to special censuses taken in Washington and found 
method tended to overestimate the populations of surbu: 
(municipalities within 30 miles of metropolitan cities). Thej 
that the ar method underestimated the population of citie; 
with large proportions of agricultural, construction, or minin 
Full evaluation of the biases in the ar estimates must a 
suits of tests against the 1980 census. If these test results 
reasoning above, then adjustments for bias should be cons 
possibility to stratify tax returns by variables i (e.g., race, a 
etc.) and calculate ins, outs, and nonmov separately for e£ 

^For example, some persons in high school or college file tax returns to obti 
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Then the net migration rate would be estimated by 

S(lNSf — OUTS,)/',- 
L(outs,- + NONMOV,)jF,- 

where F is the reciprocal of the efficiency ratio. 

The Panel recognizes that there are practical problems to stratifying 
by variables i. For example, the tax returns themselves do not provide 
information on race or age (other than over 65 or under 18). In the 
study described by Bureau of the Census (1978), race information was 
obtained by matching the irs file with a sample of the Social Security 
Administration’s summary earnings file. The latter file can also be used 
to provide age data. We commend the Census Bureau for their efforts to 
develop ways to adjust for biases in the ar method, and we encourage 
further work to extend the techniques to counties and cities with moderate 
to large populations. The usefulness of adjustment techniques such as 
the one described above should be determined by tests of adjusted esti- 
mates against 1980 census results. 


5.1b(2) Central Rates 

The denominator of the migration rate calculated on the basis of the irs 
returns is the number of nonmovers plus the number who moved away 
during the time interval under consideration. The rate is applied to a 
base population defined as the initial population plus half the births 
minus half the deaths and minus half the net number of immigrants from 
foreign countries.^ Both the numerator and the denominator of the rate 
are derived from the number of exemptions in the final year of the time 
interval. Hence the rate is not a central rate {rux). To convert the mi- 
gration rate into an approximate central rate, several changes are 
needed: 

1. The INS, OUTS, and nonmov should be measured as the average 
of number of exemptions listed in both returns. 

2. The denominator would then consist of nonmov and half the sum 
of INS and outs, where all are defined as in point 1. 

3. The old base should be divided by a factor 1 — irsrat/2 to yield a 
new base that equals the old base plus half the net migration during the 
period. This base should not include half those aliens who immigrate 

^For state and county estimates the base population also includes half the net movement 
from military barracks to non-group quarters. 
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(net) into the region, since immigrants are specifically counted separately. 
The initial population should exclude those living in group quarters. 

4. Both the denominator of the rate and the base would then be based 
on a person-years of exposure concept. 

5. If points 2 and 3 but not 1 are applied, then the estimate of migra- 
tion would be identical to that of the Census Bureau.® Thus simply in- 
stituting the change in point 1 and following the census procedure there- 
after will produce a result equivalent to points 1,2, and 3. 


5.1b(3) Migration Controls 

In the component methods the final estimate of migration is a residual. 
The initial population estimates are controlled to the population total of 
the next higher unit (subcounties controlled to counties, counties to 
states, states to national). Births and deaths are accepted as being com- 
pletely registered (or estimated at the subcounty level), and the final 
net migration component is simply the difference between the final esti- 
mate of population and the initial population estimate updated by births 
and deaths. It is clear, however, that if the initial population estimate, 
births, and deaths are accepted as being correct, then it logically would 
make more sense to control the migration component itself (rather than 
population total). An alternative procedure exists for such a control 
when separate estimates of inmigrants and outmigrants are available. 
Hence the following discussion applies only to ar, and not to component 
method II or ratio-correlation method. 

Let MiGiN be uncontrolled estimate of inmigrants from ms, and let 
MiGOUT be uncontrolled estimate of outmigrants from ms ; thus migin 
= [ins /(outs + NONMOv)] X MiGBASE. If Emigin — Emigout is Sup- 
posed to equal a control value K but does not, then one simply finds 


®The Census Bureau’s current procedure estimates net internal migration by 
INS — OUTS 


M ^ 


[POP(O) + Viz (5 — D) + V21MMIG]. 


NONMOV -f- OUTS 

If points 2 and 3 but not 1 are applied, the net internal migration estimate is 
INS - OUTS [POP(O) + ^/2iB ~ D)] 


M* = 


NONMOV + Viz (ins 4- outs) 


1 - Viz 


ins — outs 


NONMOV -b Viz (ins + OUTS) 


Hence M = M*. 


K. This yields 


~ ^migin — Smigout — K 
Umigin + Emigout 

e controlled or final estimates of inmigration and outmigration for 
:h area would become (1 — u)migin and (1 + a)MiGOUT, respec- 
;ly. Note that a will normally be very small, since net migration is 
•mally small in relation to gross migration. As an example of the use 
[5.5), consider the migration components for states: they must sum to 
0, since internal migrants from one state must go to another. If Emigin 
Emigout does not equal zero, a small adjustment would force this 
al. For subcounty (county) areas, Emigin and Emigout are adjusted 
sum to the estimated migration component for the county (state), 
us in AR, births, deaths, and all alien immigration would be accepted 
if they were true, and only the group and non-group migration would 
scaled to sum to a given total yielded from the final estimate for the 
it higher geographic level. 

rhis proposed method of controlling may produce nontrivial changes 
estimates for areas in which the current method for controlling to 
als produces changes in the area’s total population that are large in 
ition to the estimated net internal migration. 


C RATIO-CORRELATION METHOD 

e ratio-correlation method is widely used, and its application by the 
nsus Bureau suffers from the same problems found elsewhere. The 
)cedure assumes that the vector of regression coefficients for sympto- 
tic variables is invariant from the immediately preceding intercensal 
iod to the postcensal period in question. However, this invariance 
is not hold over time, both because of structural changes in the un- 
•lying relationships of the variables and because the quality of the 
iptomatic data varies. 

rhe problem of structural changes in the variables’ relationships is 
lely appreciated (e.g.. Bureau of the Census, 1974). This problem 
y be most severe when the number of variables in the ratio-correlation 
lation is large, so that collinearity becomes important. For example, 
mboodiri and Lalu (1971), in a test of ratio-correlation estimators for 
inties in North Carolina, found that the average of five univariate 
ressions produced more accurate estimates than did the fitted five- 


112 


EVALUATION OF ESTIMATES AND METHODOLOGY 


variable equation. The explanation is that although the five-variable 
equation is best in the base period, this optimality need not hold over 
time, since the relationships of the symptomatic variables to each other 
and to the variables of interest change. To resolve this problem, more 
research is needed. Use of current sample data, as discussed in section 
3.2, is one approach. Additional insight may be provided by multivariate 
techniques such as principal components analysis (see Fay, 1979, pp. 
179-183). 

5. Id ISSUES IN INTERNATIONAL MIGRATION 

The estimated total net number of international migrants is distributed 
first among states and then among places in states. The estimated net 
national number of immigrants is obtained by taking the number of 
legal immigrants reported by the Immigration and Naturalization Service 
and subtracting a constant of 36,000 as the number of emigrants.^ For 
example, the resulting net number of international migrants was 343,000 
for calendar year 1978 (Bureau of the Census, 1979). The allocation 
among states is determined by the intended residence claimed by legal 
immigrants on forms collected by the Service. Place of residence is also 
coded if its population exceeds 100,000. Allocation among places pro- 
ceeds in two steps. From the forms, the fraction of immigrants intend- 
ing to reside in places of over 100,000 is determined. This fraction times 
the estimated net number of immigrants is allocated among places ex- 
ceeding 100,000 in the same proportions given by the forms. The re- 
maining net number of estimated immigrants is allocated among places 
not exceeding 100,000 on the basis of the distribution of the foreign born 
recorded in the 1970 census. County estimates are obtained by summing 
estimates of places within counties. 

This description identifies several sources of error, which in certain 
instances (discussed below) might cause serious distortion in the popula- 
tion estimates. First, to the extent that the place of intended residence 
is not the same as the place of actual residence, there will be distortions 
in the allocation of the national net number of international migrants. 
Further, the allocations are based on information obtained from immi- 
grants; undoubtedly, the geographic distribution of emigrants is dif- 
ferent. 

Second, changes in the total net number of immigrants would result 
in the same relative net allocation of immigrants to places but would 

'^The Immigration and Naturalization Service stopped collecting data on alien immigration 
in 1957; permanent departures of U.S. citizens are also not recorded (see section 1.2c of 
Appendix A for more discussion). 
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ter the relative distribution of total population, since net immigrants 
re not a constant fraction of total population across the country. Hence 
langes in the constant used to estimate the number of emigrants 
ould affect, although to a small extent, the estimated distribution of 
Dpulation. 

Third, net illegal immigration is ignored entirely in the estimate of net 
iternational migration. If some number were assumed and added to 
le total net figure, the allocation of net immigrants among places would 
e increased proportionally, but the distribution of population would 
liange (as above). However, illegal net immigrants almost certainly do 
ot have the same settlement patterns as those given by the forms ob- 
lined from legal immigrants. Hence allocations among states are almost 
ertainly distorted. The misallocation within states is mitigated to the 
stent that illegal immigrants were counted as being foreign born in the 
970 census and to the extent that settlement patterns have not changed 
ver the decade. However, even such a qualitative assessment is risky 
ecause of the two-stage allocation procedure and because there are few 
ata to provide the basis for judgement. In addition, to the extent that 
legal immigrants were not counted in the 1970 census, the base for the 
pdates is relatively too small. This comment applies to all places that 
ere underenumerated in 1970, regardless of whether those not counted 
ere illegal aliens. 

The net result of these considerations is that the population data for 
;ates with heavy concentrations of illegal immigrants are subject to 
ownward bias. The problem for individual places, including such large 
ities as New York, Los Angeles, and Houston, may be severe. These 
laces are thought to attract a large fraction of the illegal immigrants to 
le United States, yet the estimating procedure has no mechanism to 
ike account of such new immigrants. Since these cities (and others) may 
e constrained by the 145-percent rule (see Appendix E), the gain in 
Rs revenue that would occur if immigrants were properly allocated 
ould be significant. It must be noted, however, that in spite of these 
bvious shortcomings we can think of no better procedures for allocat- 
ig net immigrants. Until the Immigration and Naturalization Service 
btains adequate data on legal emigrants and immigrants and until 
legal migration becomes numerically trivial or better techniques are 
eveloped to estimate it at the national and local levels, adequate esti- 
lates cannot be produced. 


.le PER CAPITA MONEY INCOME 

ncome is a complicated concept. Different measures of income can be 
onstructed, depending on the concept adopted. The Census Bureau 


income sources, inciuamg interesi, uiviucnub, 

etc. Data on money income can be easily obtained from household sur- 
veys such as the Current Population Survey and from decennial censuses. 
However, many of these data are difficult to obtain from administrative 
records, which form the data base from which estimates of postcensal 
changes in income are developed. In updating county per capita income 
the Census Bureau relies on adjusted estimates of county farm income 
and other income components (other than wages and salaries) produced 
by the Bureau of Economic Analysis (bea). The only available data for 
subcounty estimates are the adjusted gross income (agi) figures on the 
IRS individual income tax returns; the bea county estimates for the re- 
maining components of income are apportioned to subcounty areas. 

The conceptual basis for the bea income estimates is that of personal 
income in the national income accounts. The national accounts measure 
income generated by various kinds of economic activity; personal income 
is the income of all residents of an area from all sources (including in- 
kind payments and imputed items). It includes income received not only 
by individuals but by quasi individuals (nonprofit institutions, private 
noninsured welfare funds, and private trust funds). The Census Bureau’s 
money income is a statistical construct designed to be susceptible to 
measurement in household interviews. The two concepts, money income 
(Census Bureau) and personal income (bea), are not congruent, and 
converting bea personal income estimates into estimates of components 
of total money income requires difficult adjustments of unknown re- 
liability. 

Farm income is especially difficult to estimate. The Census Bureau 
defines farm self-employment income as the gross income received from 
operation of a farm minus production expenses. The bea farm income 
estimate measures income arising from the current year’s production in 

Neither the bea s personal income concept nor the Census Bureau’s money income 
concept adequately measures income as a return to a factor of production. The issue is 
most serious with respect to self employment income. All proprietary establishments re- 
quire labor and capital inputs, but the final profit or loss figure represents a net sum. If 
t e returns to labor and capital were clearly identified and measured as a capital return or 
a labor return, an improved income measure would result. This income measure would un- 
doubtedly differ from current accounting income measures. 



m farm marketing of crops and livestock, payments to farmers under 
; several government support programs, the value of food and fuel 
)duced and consumed on the farms, the gross rental value of farm 
ellings, and the value of net change in inventories of all crops and 
jstock. 

rhe first two items, cash receipts from marketing of crops and live- 
ck and payments to farmers under the several government programs, 

; the most important components of gross farm income, and they are 

0 included in the money income concept of the Census Bureau. How- 
;r, with the exception of a few states, annual data on cash receipts for 
ips are available only at the state level. The available data on cash 
eipts must therefore be disaggregated for counties. These disaggre- 
ions are made by prorating current cash receipts by crop according to 
; past receipts measured by the last quinquennial census of agri- 
ture. This procedure presents a data problem because 3 or 4 years 
er an agricultural census a small area such as a county may have 
fted its production from one crop to another. A crop that was im- 
dant at the time of the census of agriculture may be far less important 
hin a short time period because of changing market conditions. An 
;ellent example is soybean production. Less dramatic production 
fts, while more subtle, may destroy the accuracy of the process of 
portioning a state estimate among counties. The problem is aggravated 
en farm income is a large proportion of a county’s total income, 
[nventory adjustments present another difficult issue for the measure- 
nt of farm income. The Census Bureau’s money income concept does 
!: make allowance for inventory changes. If inventories increase during 
iven time interval, the Census Bureau’s income measure could well be 
native, since sales or receipts would be down with expenses constant, 
ile the bea’s measure of net farm income could show a positive total 
;ause it makes allowances for inventories. Omission of inventory 
inges produces erratic shifts in the Census Bureau’s estimates of county 
m income, a measure notorious for fluctuations. Substantial shifts in 
m income (or any type of income) are desirable in the data series when 
:y correspond to real shifts in production. Part of the problem with 

1 Census Bureau’s farm income results from a faulty conceptual base. 

ti many areas of the country, cotton and soybeans can be planted interchangeably. They 
close substitutes in production, and production changes occur rapidly in response to 
:e changes. If the price of cotton is high, farmers plant cotton. If the price of soybeans 
,igh, farmers plant soybeans. If there is any one point well established in the literature, 
; that farmers are extremely responsive to changes in the relative prices of their crops 
hultz, 1964). 


or average total money income of residents in a given area at a gii 
point in time. Total money income is the sum of six components: wi 
and salary income; nonfarm self-employment income; farm self-empl 
ment income; social security and other retirement income paymei 
public transfer payments, including assistance payments; and oti 
income sources, including interest, dividends, unemployment insuran 
etc. Data on money income can be easily obtained from household s 
veys such as the Current Population Survey and from decennial census 
However, many of these data are difficult to obtain from administral 
records, which form the data base from which estimates of postcen 
changes in income are developed. In updating county per capita inco 
the Census Bureau relies on adjusted estimates of county farm inco 
and other income components (other than wages and salaries) prodm 
by the Bureau of Economic Analysis (bea). The only available data 
subcounty estimates are the adjusted gross income (agi) figures on 
IRS individual income tax returns; the bea county estimates for the 
maining components of income are apportioned to subcounty areas. 

The conceptual basis for the bea income estimates is that of perso 
income in the national income accounts. The national accounts meas 
income generated by various kinds of economic activity; personal inco 
is the income of all residents of an area from all sources (including 
kind payments and imputed items). It includes income received not o 
by individuals but by quasi individuals (nonprofit institutions, priv 
noninsured welfare funds, and private trust funds). The Census Burea 
money income is a statistical construct designed to be susceptible 
measurement in household interviews. The two concepts, money inco 
(Census Bureau) and personal income (bea), are not congruent, £ 
converting bea personal income estimates into estimates of compone 
of total money income requires difficult adjustments of unknown 
liability. 

Farm income is especially difficult to estimate. The Census Bun 
defines farm self-employment income as the gross income received fr 
operation of a farm minus production expenses. The bea farm inco 
estimate measures income arising from the current year’s productior 

Neither the bea's personal income concept nor the Census Bureau's money inc 
concept adequately measures income as a return to a factor of production. Tlie issu 
most serious with respect to self employment income. All proprietary establishments 
quire labor and capital inputs, but the final profit or loss figure represents a net sun 
the returns to labor and capital were clearly identified and measured as a capital retur 
a labor return, an improved income measure would result. This income measure would 
doubtedly differ from current accounting income measures. 
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•arm sector. Thus the bea gross farm income includes cash receipts 
[farm marketing of crops and livestock, payments to farmers under 
several government support programs, (lie value of food and fuel 
uced and consumed on the farms, the gross rental value of farm 
lings, and the value of net change in inventories of all crops aiui 

tock. 

le first two items, cash receipts from marketing of crops and live 
( and payments to farmers under the several government iirograms, 
:he most important components of gross farm income, and (hey arc 
included in the money income concept of the Census Bureau, liow- 
with the exception of a few states, annual tlata on cash reeeiids foi' 
s are available only at the state level. 'I'he available data on cash 
pts must therefore be disaggregated for counties. Thesi' disaggre 
)ns are made by prorating current cash receipts by crop aeeoi ciing lo 
past receipts measured by the last c|uinquennial census of agri 
ire. This procedure presents a data iiroblem because ^ iw -1 years 
an agricultural census a small area such as a county may lia vi- 
ed its production from one crop to another. A crop that was iiu 
:int at the time of the census of agriculture may be far less important 
in a short time period because of changing market conditions. An 
(lent example is soybean production." Lc.ss dramatic produeiion 
s, while more subtle, may destroy the accuracy of the process of 
irtioning a state estimate among counties. The problem is aggravated 

1 farm income is a large proportion of a county's total income, 
ventory adjustments present anotiicr iliffictill issue (or the measure 

t of farm income. I he Census Bureau’s money income concept docs 
nake allowance for inventory changes. If inventories incre.isc durin)> 
en time interval, the Census Bureau's income measure could well be 
tivc, since sales or receipts would be down with e.speiiscs conslani, 

2 the bea's measure of net farm income could show a ])osiii\e total 
use it makes allowances for inventories. Omission of iiocntorv 
iges produces erratic shifts in the Census Bureau's estimates of couniv 
1 income, a measure notorious for fluctuations. .Substantial shifts in 
i income (or any type of income) are desirable in the data series \\iicii 
correspond to real shifts in |)roduc(ion. Bart of the problem with 
"ensus Bureau’s farm income results from a faulty eoiieeptiial base. 

iiany areas of the country, codon amt soybeans can In- plaiitcil iiUcu luini'.iMtils I tu-v 
ose substitutes in procluction, and [irodiiction clian>;cs oiTiir ia|i|(llv in to 

changes, If the price of eotton is iiigli, farmers plant cotlnii. II tin- piicr <>1 •.mvIk-.iii'. 
h, farmers plant soybeans. If (licre is any one point well eslatilisbeil in tin- litn.uuir, 
hat farmers are extremely responsive to eli;inj.;es in the relative jniies i.| ilien i 
itz, !%'!). 


adjustments are performed anyway (Bureau ot bconomic Anar 
1977). 

The suitability of a particular concept of income depends upon 
uses to which it is put. The general revenue sharing formulas use m( 
income in several ways. In the five-factor formula for states, the proi 
of population and the reciprocal of per capita income serves as a r 
sure of “relative poverty.” In the three-factor formula for states an 
the county and subcounty formulas, “need” is measured by the inv 
of per capita money income. Total money income — estimated as 
product of the population and per capita income estimates — is use 
the county and subcounty formulas to measure the relative final 
ability of a county or subcounty government to collect taxes. In cont 
in both the three-factor and five-factor formulas the relative final 
ability of a state government to collect taxes is measured by total pers 
income. The Panel has not considered which of the two income 
cepts is more appropriate for each of these uses, except to note 
neither the money income concept of the Census Bureau nor the pers 
income concept of the bea is ideally suited to represent the varis 
above. For example, both income measures are only partial indicate 
a government’s financial ability to collect taxes because they fail t( 
fleet revenue sources, such as motor fuel taxes in tourist-oriented si 
like Vermont or gambling in Nevada (Advisory Commission on Ii 
governmental Relations, 1971). Also, as measures of “need” or “reli 
poverty, both income measures fail to reflect area differences in 
cost of living, the types of services needed by area residents, or thi 
come distribution in an area (Office of Federal Statistical Policy 
Standards, 1978, p. 30). 

Considering the weak conceptual basis of the Census Bureau’s m 
income measure and the complex and not inexpensive adjustment: 


‘-The BEA income estimates involve the addition of these components, which are di 
0 measure, but the adjustment for compatibility with Census Bureau income i 
involves subtracting these components from the bea measure. 
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quired to produce updates of county money income, the Panel recom- 
mends that the Census Bureau seriously consider not producing post- 
censal per capita money income estimates for counties. Alternatively, 
the Census Bureau could consider simpler procedures for making up- 
dates; some possibilities were suggested in section 4.2. If tests against 
the 1980 census show that the current methods or one of the simpler 
methods yields highly accurate estimates, then the Census Bureau may 
wish to continue making postcensal per capita income estimates for 
counties. Otherwise, users should rely on bea estimates of per capita 
personal income by place of residence rather than on per capita money 
income. 

A separate argument for using bea personal income estimates is their 
conceptual linkage to the national income and product accounts. Move- 
ments in the level of economic activity are monitored through the national 
income and product accounts system. Policy decisions associated with 
economic activity rely on such concepts as gross national product, per- 
sonal income, or one of the other accounts. If it is assumed that the bea 
local data are as reliable or more reliable than the money income data, 
a point that requires testing (and such testing would admittedly be dif- 
ficult), bea data have an edge due to their consistency with other 
recognized measures of economic activity. 

Reliance on the bea estimates is not viewed by the Panel as a panacea. 
No tests of personal income estimates have been performed, and their 
accuracy has not been measured. However, the Census Bureau county 
money income estimates draw heavily on the bea estimates, so errors in 
the latter are likely to be present in the former. Estimation of county 
income, under either concept, is difficult for those components such as 
farm income, for which good area data are not available. Subcounty 
postcensal income estimates should not be produced on either a personal 
income or money income basis, because the concepts and methods are 
too complex and the communities too numerous to produce reliable in- 
come estimates. If a mid-decade census is taken, subcounty income up- 
dates could be produced on a quinquennial basis. 


5.2 CRITIQUE OF COMBINATIONS OF METHODS: 

ISSUES OF UNIFORMITY AND AVERAGING 

5.2a alternative methods of estimation 

There are three general kinds of procedures that could be used to obtain 
local estimates: (1) using results from the most recent decennial census 
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(without updating), (2) using sample data to produce direct or synthetic 
estimates, and (3) using auxiliary data according to a model. The first 
procedure is used at the state level for determining representation in the 
electoral college, which changes only when decennial census tabulations 
are published. The second procedure was used for estimates of variables 
collected on a sample basis in the decennial census; this procedure is 
used to obtain the 1970 census estimates of per capita income. The 
third procedure is used by the Census Bureau to compute postcensal 
estimates of population and per capita income. Choices among ratio- 
correlation, administrative records, and component method II estimates 
pertain only to the selection of the best representative of this category of 
estimates. 

It is also possible to suggest a fourth procedure, which would involve 
an optimal combination of the second and third (and, possibly, the first) 
procedures. One way of combining sample estimates and estimates based 
on auxiliary data (“auxiliary estimates”) would be simply to average 
them. For example, when 1970 population estimates for the 11 psu’s in 
the Current Population Survey with populations of more than 2 million 
were obtained by averaging sample estimates from the October 1969 and 
January, April, July, and October 1970 surveys, the average error was 
2.0 percent. These sample estimates could be averaged with selected 
auxiliary estimates to produce an optimal combination, perhaps using 
empirical Bayes techniques. Such averaging would be a significant step, 
since these 11 areas included just over 25 percent of the nation’s 1970 
population. A second way of combining sample data with auxiliary esti- 
mates is by the regression-sample data procedure originally formulated 
by Hansen et al. (1953) and applied to the population estimation problem 
by Ericksen (1974). Here a regression equation using auxiliary estimates 
and other auxiliary information is computed by using sample estimates, 
in this case obtained from the cps, of the variable in question (the de- 
pendent variable). If only auxiliary estimates are used, this regression 
equation estimates the optimal weighting allocation among the auxiliary 
estimates. Examples of how 1973 and 1975 population estimates could 
have been improved by such a procedure are given below. 

Disregarding combinations of procedures for the moment, it is clear 
that the first procedure should be favored whenever the most recent de- 
cennial census provides more accurate estimates of current population 
or income than do available postcensal estimation procedures. The ac- 
curacy criterion or loss function needs to be specified, but if it is squared 
relative error, then the procedure should be favored when the variability 
of rates of change is less than the mean squared relative error of available 



uiia-Liiig ]jnjt.cuLiica. i ui cAamptc, u uiic waiiLcu pupuiaiiuii aiiu per 
)ita income estimates for 1971, it seems intuitively reasonable to use 
: 1970 census counts. For a later year it also seems plausible that 
10 census counts might be more reasonable for some units, particularly 
)se with small populations and few data. Moreover, there may be 
nations in which the first procedure would be favored for estimates for 
ome but not for population. 

Fhe second procedure would be used in two circumstances. The first 
when sample data are sufficient to provide accurate estimates for 
al areas (as in the case described immediately above). The second is 
en sets of areas with common characteristics could be combined and 
:ommon estimate formed. For example, one could use cps data to 
tain an estimate of per capita income for “central cities under 250,000 
pulation in the Northeast.” Different categories of local areas could 
considered, and the accuracy of estimates could be assessed by vari- 
ce computations and other techniques. 

The third procedure, which is the one used by the Census Bureau, 
mid be chosen when the auxiliary information is available and there 
so much variation among local areas in a category that combined 
nple estimates have large errors. The auxiliary information is available 
m vital statistics, school enrollment data, income tax records, and 
•ious other sources, which vary by state. The problem with this set of 
icedures is that there is no satisfactory way of evaluating their accuracy 
:ept by conducting special censuses. The Census Bureau and the state 
mcies that conduct such tests usually use evaluations from preceding 
ercensal periods to verify accuracy. This can be misleading when rela- 
nships among variables change from one time period to another. 

!b UNIFORMITY 

e specific comments in this section pertain to population estimation, 
t the underlying ideas also apply to the generation of income esti- 
Ltes. Although there are some variations by state, the general method 
id to produce preliminary estimates of county populations is to com- 
te an equally weighted average of the ar and cm ii estimates. This 


n this instance, variability of rates of change should be measured by the average squared 
:nge in population, with the change expressed as a proportion of the true current popu- 
on. 

.s was noted in Chapter 1, these are the estimates used for determining general revenue 
ring allocations; see Appendix A, section 3.1. 
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or other demographic characteristic of the population. For subcounty 
areas, ar estimates are used, since the data for computing the other 
estimates are not consistently available. Thus while a given procedure 
may be particularly good for larger areas but another may be particularly 
good for smaller areas, one or the other of the procedures, or an average 
of them, is applied to all areas because the Census Bureau uses a uni- 
form procedure throughout. 

Several features of the Census Bureau’s approach bear examination, 
One of these is the use of equal weights for averaging the auxiliary esti- 
mates. If auxiliary estimates are of unequal accuracy, then unequa 
weights can produce more accurate estimates than equal weights. As ar 
extreme example, in the most exhaustive test of county estimates con 
ducted by the Bureau to date (in which 2,586 county estimates in C 
states were compared with the 1970 census), it was found that the rc 
procedure, most accurate among the four procedures tested, gave bette 
results (in terms of average percent difference) than any equally weightec 
average of two, three, or four estimates. (See Bureau of the Censu 
(1973-b, Table C), but note that the term “regression” there refers to thi 
ratio-correlation method.) 

A second feature that bears examination is the uniformity constraint 
By relaxing this constraint the Bureau could improve the accuracy of it 
estimates. There are four ways of relaxing uniformity: 

1. Different kinds of data could be used for making estimates fo 
different local areas in the same state. For example, different regressio 
equations (using different sets of independent variables) could be appliC' 
for different counties within a state. 

2. Counties from different states but with comparable data source 
could be estimated by a single regression equation (as was done b 
Ericksen (1974)). Even among counties with comparable data series 
separate regressions might be determined for counties differing accorc 
ing to region, size, rate of growth, age structure, or other characteristic 

3. Different methods may be used for different local areas. Fc 
example, additional data sources are available for many large cities, s 
that alternative estimates could be prepared. These alternative estimate 
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ould be averaged with the administrative records method estimates 
urrently used by the Bureau for subcounty areas. 

4. Different mixes (weighted averages) of methods could be used for 
ifferent local areas. For example, the accuracy of component method II 
stimates drops more rapidly than does that of the administrative records 
lethod as population size decreases. When component method II and 
dministrative records method estimates are averaged for counties, the 
'eight assigned to the component method II estimates could be smaller 
3 r small areas that for large ones. 


.2c EVALUATION OF THE CENSUS BUREAU’S APPROACH TO 
WEIGHTING ESTIMATES 

’his section evaluates the use of equal weights to compute 1975 population 
stimates for counties. The relative accuracy of alternative weighting 
chemes, obtained by using the regression-sample data procedure 
Ericksen, 1974; Fay, 1979; Gonzalez and Hoza, 1978) can be judged 
gainst 130 special county censuses. 

In this analysis we evaluate the use of equal weights by comparing the 
ccuracy of the estimates of 1975 population provided by equally 
weighted averages of 1975 ar and cm ii estimates with the accuracy of 
ifferentially weighted averages of these estimates. In 1975 the Bureau’s 
reliminary county estimates were derived from the average of admin- 
Jtrative record and component method II estimates. If we define Xj as 
lie ar estimate and X 2 as the cm ii estimate, the Bureau’s method can 
e written as T = .50^1 + .50^2- Four other auxiliary estimates were 
vailable for counties in all states: the respective 1975/1970 ratios of the 
umbers of Medicare recipients, numbers of school children, numbers of 
icome tax exemptions, and income tax returns. Our evaluation consisted 
f selecting the best combination of these six auxiliary estimates, using 
egression with the psu estimates of population growth (from the cps) 
erving as the dependent variable to compute weights. 

All simple squared correlations (r^ ) and multiple squared correlations 


The computations for this test were carried out at our request by David Word at the 
jpulation Division of the Census Bureau. 

’As was described in section 1.2a, the preliminary county estimates for 1975 were derived 
s the sum of the revised estimate for 1974 plus the equally weighted average of two 
stimates of change from 1974 to 1975, obtained as the difference between the 1975 and 
974 AR estimates and the difference between the 1975 and 1974 cm ii estimates. (For 
reliminary county estimates in some states, the difference between the 1975 and 1974 
stimates by a third, locally used, method was averaged equally with the differences in 
le AR and cm ii estimates.) 


(i?2) for each of the 15 pairs ot auxiliary estimates are presented in Table 
5.1. No combination of three estimators produced a higher multiple cor- 
relation than the best pair of estimators, administrative records and 
component method II. If one looks first at the multiple correlations, 
AR and CM II explained 27.4 percent of the variance in the cps estimates. 
This is scarcely better than the 27.2 percent of variance explained by the 
AR estimate alone, however. The small size of this improvement is ex- 
plained by the similarity of the simple correlations with the cps estimates 
{r = .522 for administrative records and .504 for component method II) 
and the extremely high correlation (r = .940) between these two auxi- 
liary estimates. We also note that the observed correlations between the 
auxiliary estimates and the cps estimates are shrunk toward zero because 
much of the variance of the cps estimates arises from random sampling 
error and hence cannot be explained by the auxiliary estimates. 

Because the multiple correlation is so close to the simple correlation, the 
choice of “best” estimate is ambiguous, but preference could be given to 
the two-variable equation because of the observed increase in explained 

TABLE 5 . 1 Correlations of Auxiliary Population Estimates With Sample 
Population Estimates for psu’s From the cps, 1975 


Simple Multiple 

Correlations Correlations 

(Squared) Variables (Squared) 


AT) . administrative records 

.272 

^ 1 . 

^2 

.274 

Xi, component method II 

.254 

a:,. 

^3 

.272 

X 3 , ratio of irs exemptions 

.252 

^ 1 . 

A 4 

.273 

^ 4 , ratio of irs returns 

.246 

^ 1 . 

^5 

.273 

As, ratio of school enrollment 

.181 



.272 

Aft, ratio of Medicare recipients 

.166 

^2. 

^3 

.263 



^2. 

A 4 

.267 



^2. 

^5 

.255 



^2. 


.258 



^3. 

A 4 

.253 



^3. 

^5 

.252 



^3. 

^6 

.253 



a:4, 

^5 

.257 



X 4 , 

^6 

.246 



^5. 

^6 

.222 

Note: The psu’s from the cps were weighted according to the size of the stratum being 
represented: hence the larger self-representing psu’s had the largest weights. 

source: Computations from the Bureau of the Census provided by David Word (private 


communication, March 23, 1979). 
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variance and because the presence of within-psu sampling error did damp 
the observed correlations. Computing two regression equations, one using 
the AR estimate as the single independent variable and the other using 
AR and CM II estimates in a two-variable equation, we obtain 

Y = .040 -f .976^1 .272 

Y = .027 + .7662£:i -h .223 Ar2 = .274. 

In both cases, county estimates were computed for all counties in the 
United States. Because the sum of the county estimates was slightly 
greater than the estimated national population, each estimate was 
multiplied by 0.985 to make the sum of county estimates agree with the 
national total. 

Table 5.2 shows a comparison of the 130 estimates for counties that 
had special censuses with their enumerated populations. Notice that the 
actual results support our predictions based on the correlation and re- 
gression analysis. The two-variable regression equation produced the 
best results, though not by much, with the one-variable regression equa- 
tion coming in second, giving nearly identical results to the administrative 
records estimate alone. The equally weighted average of ar and cm ii 
estimates was less accurate overall than the ar estimate alone or either 
of the regression estimates. Here, use of equal weights detracts from 
overall accuracy, a result similar to that observed for 1970 when the ratio- 
correlation estimates were compared to all possible equally weighted 
averages of four estimates (Bureau of the Census, 1973b, Table C). 

It will be noted that the cm ii estimates decreased the accuracy most 
for small counties. This reinforces our suggestion that the Bureau would 
do well to weight procedures differently for different types of counties. 
Component method II does well for large counties and can give improved 
results to those obtained for administrative records alone. The computa- 
tion of separate cps-based regression equations for large and small 
counties might provide guidance on how to produce such stratified esti- 
mates, but to our knowledge such experimentation has not been done. 

'^The reader may note that the term “weighted average” is being used loosely, to include 
both use of negative weights for variables and use of the constant term in the average. The 
weighted averages may be constructed to satisfy various constraints — e.g., no constant term, 
nonnegative weights, or weights summing to 1.0 — but we do not find compelling motivation 
for these constraints. Fay (1979) notes an analogy between the sum of the weights being less 
than 1.0 and the shrinkage phenomenon arising in Stein-James estimators. We also note that 
although least-squares is the criterion used here to estimate the weights, other criteria per- 
taining to alternatives (discussed in section 3.1) may also be used. 
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TABLE 5.2 Average Percent Differences for Alternative 1975 F 
Estimates Tested Against 130 County Censuses 


Counties With 1975 Populatio 


Type of Estimator 

All 

Counties 

100,000+ 

5.000 to 

100.000 

U 

Component method II alone 

6.42 

2.31 

5.41 

K 

Administrative records alone 

4.14 

2.03 

3.87 

( 

Equally weighted average of 
component method II and 

administrative records 

4.32 

1.79 

4.03 

( 

Regression, one variable (ar) 

4.12 

2.03 

3.78 

( 

Regression, two variables 

(ar, cm ll) 

4.01 

1.82 

3.73 

( 


Note: Estimates refer to July 1, 1975. The county censuses against which ti 
were compared were taken between July 1, 1974, and December 31, 1976, 
interpolated or extrapolated to July 1, 1975, from the April 1, 1970, counts, 
ference for each county equals estimate (as of July 1) minus adjusted special ( 
(interpolated or extrapolated to July 1, 1975), expressed as a percent of i 
special census count. Average percent difference was calculated as the arithm 
percent differences disregarding sign. 

source: Computations from the Bureau of Census provided by David W 
communication, March 23, 1979). 


As was noted above, the fallacy of arbitrary averaging was al: 
strated by 1970 data. For that time, four auxiliary estimates— ri 
lation, component method II, composite, and vital rates- 
available. The average percent difference of the ratio-c 
estimates was 4.6 percent, less than the average percent differer 
equally weighted average of two, three, or four estimates (Bur 
Census, 1973b, Table C). When those four auxiliary estimates w 
to three ratios — births, deaths, and school enrollment in the sai 
exercise that we have just reported — the best combination of var 
composed of births, deaths, school enrollment, and the ratio-c 
estimates. For this, = .428, and the regression equation wa 

Y = .085 + .745 (ratio-correlation) + .214 (school enrollmen 

.045 (deaths) — .0^ 

When the estimates obtained from this equation were made 
counties in 42 states, the average percent difference obtaine 
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rcent, and the number of large errors (10 percent or more) was 194, 
mpared with 264 large errors obtained from ratio-correlation alone. 
) improvements were obtained by adding the other three auxiliary 
timates to the equation or by substituting them for births, deaths, or 
liool enrollment. 

2d IMPROVING THE PRECISION OF SAMPLE DATA 

ita from surveys such as the cps and the Annual Housing Survey 
ould be used more in the postcensal estimation program, both in pro- 
icing and in evaluating estimates. The usefulness of cps data for 
istcensal population estimates could be further enhanced by certain 
anges in the cps design. One recent design change made by the Census 
ireau is the monthly collection of age, race, and sex data for each house- 
ild member. This change will allow more data to be pooled to provide 
itter yearly sample estimates. The Panel suggests four additional 
anges; 

1. Data on central city or suburban location as well as identification of 
unty of residence should be collected. These data would facilitate the 
mputation of separate regression equations for different types of coun- 
:s as well as separate equations for central cities for other types of local 
eas. 

2. The CPS is currently designed to minimize the variances of national 
id state unemployment and employment estimates. Research should be 
me to see if the cps could be redesigned to improve the accuracy of 
ipulation and income estimates for local psu’s without substantially 
creasing the variances of the state employment and unemployment 
timates. 

3. Within-psu samples should be selected in such a way as to facilitate 
timation of within-PSU variance, provided such redesign does not sub- 
mtially increase the variances. 

4. If the CPS sample were enlarged, particularly in non-self-representing 
eas, precision of the estimates would be increased; however, this would 
volve substantial expense and may not be practicable. 
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ote: The descriptions in this paper are based on the authors’ under- 
anding of the methods used by the Bureau of the Census to estimate 
opulation over the period 1970-1977. 

The Census Bureau is continually refining its procedures in (usually) 
linor ways, and such changes are noted wherever possible. Nevertheless, 
ecause of the ongoing modifications and because of the great complexity 
f the methodology the methods practiced by the Census Bureau and as 
escribed below may differ in minute details. 

The generous and indispensable assistance of Census Bureau staff is 
ratefully acknowledged, in particular, that of David Word, Frederick 
avanaugh, Mary Kay Healy, Jennifer Peck, Richard Irwin, Jerome 
rlynn, David Galdi, Joseph Knott, Edward Hanlon, Marianne Roberts, 
arbara van der Vate, Richard Engels, Louisa Miller, Sharon Baucom, 
ranees Barnett, and Joel Miller. They discussed the Bureau’s methodol- 
gy with the authors, and several of them reviewed earlier drafts of this 
ppendix. Final responsibility for the accuracy of the descriptions rests 
ith the authors. 
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NTRODUCTION AND OVERVIEW 

*ostcensaI estimates refer to a date (past or current) following a decennial 
ensus and use that census and possibly earlier censuses as a point of 
eparture. To understand postcensal population estimation methods for 
mall areas, it is necessary first to understand those for the larger units of 
lopulation — counties, states, and the nation as a whole. The reason for 
his is that the Census Bureau prepares its postcensal estimates by first 
laking the national estimate. Then estimates for the 50 states and the 
)istrict of Columbia are made and controlled (forced to sum) to the na- 
ional total. Subsequently, county estimates are controlled to a state total, 
nd subcounty estimates to a county total. 

Essentially, two kinds of methods (component and regression) are used, 
’he component method calculates separately three elements of population 
ynamics: net natural increase (number of births minus deaths), migra- 
ion (net inmigration, including immigration), and changes in “special 
lopulations’’ not reflected in symptomatic data, namely, group quarters 
lopulations. These individual components are then aggregated to yield an 
stimate of population change. 

In the regression method, equations are constructed to relate observed 
lopulation changes to observed changes in other “symptomatic” data that 
re available and considered relevant to population changes. Subsequent 
bserved (postcensal) changes in symptomatic data are then transformed 
ly the equations to yield estimates of postcensal changes in population. 

Postcensal estimates of the total U.S. population are made using a com- 
fonent method. This procedure is described in Part 1 of this appendix. 

The state population estimates are derived by averaging the results of 
hree methods: component method II (cm ii), administrative records 
nethod (ar), and ratio-correlation method (rc). Component method II 
nd administrative records are both variations of component methods, 
’hey differ only in estimation of net migration: cm ii relies on changes in 
chool enrollments, while ar uses matched individual federal income tax 
eturns and treats net immigration separately. Ratio-correlation is a 
egression method. State procedures are discussed in Part 2. 

County estimates (discussed in Part 3) are generally produced from 
nethods similar to those used in state estimates. However, in some states, 
ounties use additional information, such as data on drivers license 
egistrations or new housing units. 

Finally, methods for estimating subcounty populations are described in 
’art 4. With a few exceptions the procedures are similar to the ad- 
ninistrative records method used at the state level. 

Estimation methods are described here in the statistical tradition, in 
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that parameters are introduced and the objective of the estimation pro- 
cedures is accurate estimation of the parameters. A difficulty in describ- 
ing the estimation procedures in this way is the lack of well-defined 
stochastic or demographic accounting models underlying the procedures. 
This occurs because the descriptions written by the Bureau of the Census 
outline their procedures and their objective but fail to specify in detail the 
models underlying the procedures. 

A statistical model should be consistent: if each parameter in the model 
were perfectly estimated, the objective described by the model (here, total 
population) would be perfectly estimated. It is not permissible to omit 
parameters entirely, even if data to estimate them are not available. For 
example, a model for postcensal change in total U.S. resident population 
should not exclude a component for change in the number of “illegal” im- 
migrants, even though satisfactory data may be lacking. This component 
does not have to appear as a separate entity — it may be incorporated into 
one or more other components — but it must not be omitted entirely. 
While it is permissible to use estimators that fail to coincide with the 
parameters as to geography or time of reference, the model itself must be 
consistent and well specified. 

The deviation of an estimate from its parameter is referred to as error. 
The sources and structure of error will be discussed below for the various 
postcensal estimation methods used by the Bureau of the Census. It 
should be recognized that a major, for many areas the major, source of er- 
ror in the estimate of total postcensal population is undercoverage (under- 
count) in the decennial census. On the other hand, undercoverage (for 
small areas) is a minor component of error for the estimates of postcensal 
change in population. For this reason, discussion of the sources and struc- 
ture of error will generally omit undercoverage as a source of error. 


PART 1 U.S. POPULATION 
1.1 INTRODUCTION 

The resident population of the United States includes residents of the 50 
states and the District of Columbia. It does not include residents of Puerto 
Rico and the outlying areas under U.S. sovereignty or jurisdiction, armed 
forces stationed in foreign countries, and other American citizens residing 
outside the United States. Postcensal estimation of this total resident 
population during the period 1970-1977 is described below with respect to 
methodology (section 1.2), sources of data and errors (section 1.3), and er- 
ror structure (section 1.4). Apportionment of the estimated total by age, 
race, and sex is discussed in section 1.5. 
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.2 METHODOLOGY 

he Bureau of the Census makes postcensal estimates of the U.S. resident 
opulation by estimating components of population change since the 
revious decennial census (1970). These components, to be discussed in 
le following sections, include natural increase, net immigration of U.S. 
rmed forces from abroad, and net civilian immigration. 

The change in total population since the 1970 census is estimated by the 
jm of the three components of change. The estimate of postcensal 
opulation is then obtained by adding the estimate of change to the 1970 
opulation count. 


.2a Natural Increase 

[atural increase equals the number of births minus the number of deaths, 
'he National Center for Health Statistics (nchs) provides reports of these 
umbers. Until 1970 (1960) adjustments were made for estimated under- 
jgistration of births (deaths). These adjustments are no longer made 
ecause the amount of underregistration is believed to be small and 
ecause of the difficulty of correctly apportioning the imputed births to 
iibnational areas. 

.2b Net Immigration of Armed Forces From Abroad 
'his component is estimated by the following total: 

(a) number of armed forces abroad in 1970 
— (b) number of armed forces abroad on the estimate date 
—(c) number of deaths to armed forces abroad since previous census 
+(d) net change in number of recruits from Puerto Rico who are with 
the armed forces. 

'hese numbers are obtained from the Directorate of Information of the 
J.S. Department of Defense and from the Army, Navy, Air Force, Ma- 
ine Corps, and Coast Guard. 


.2c Net Civilian Immigration 

let civilian immigration is estimated by the following total: 

(a) alien immigration 
+ (b) parolee immigration 
+(c) net arrivals from Puerto Rico 
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+ (d) net movement of civilian citizens associated with the U. 
ment 

— (e) other net emigration (including migration of U.S. cit 
aliens not included in (c) and (d) above). 

Alien immigrants are those nonrefugee aliens accepted for p 
residence by the Immigration and Naturalization Service (ins) 
classifies an individual as an immigrant when it grants perma 
dence status. Since this does not necessarily coincide with tf 
physical entry into the United States, the Census Bureau realh 
migrants for whom the data are available to date of entry. Mo: 
and Indochinese who change their status from nonimmigrant 
nent resident alien can be reallocated to date of entry, but gener 
nonimmigrant aliens cannot be reallocated when they adjust to i 
status. Many other individuals (notably students) who enter wi 
migrant visas later adjust to immigrant status, but these peop 
reallocated to date of entry. 

The Bureau of the Census does not attempt to include ill 
migrants or aliens temporarily residing in the United Stat 
estimate of net immigration. The latter group includes aliens 
porary visas (students, visitors, diplomats) and numerous a^ 
workers from Mexico and the British West Indies working unc 
contract. 

The classification “parolee” refers to nonimmigrant aliens c 
permanent resident aliens who are allowed to remain in the Uni 
permanently. Parolees consist almost entirely of refugees from I: 
Cuba, Hong Kong, and communist countries of eastern Europe, 
immigration figures do not include the parolees. Counts of i 
parolees are provided by the Immigration and Naturalization S( 
the Task Force for Indochina Refugees of the U.S. Department i 
Education, and Welfare (hew). 

Net arrivals from Puerto Rico are estimated on the basis of 
statistics. The Puerto Rico Planning Board collects data from a 
carriers on passengers entering and leaving Puerto Rico. The 
between the number of departures from Puerto Rico and the i 
arrivals to Puerto Rico is used to estimate the net migration fr( 
Rico to the United States. The implicit assumption is that the 
between Puerto Rico and countries other than the United Stat( 
nificant. To reduce the fluctuations that can arise from the seasc 
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States is most reliable for those individuals affiliated with the U.S. govern- 
ment. This group includes overseas civilian citizen employees of the 
federal government as well as overseas citizen dependents of federal 
employees and servicemen. Data from the U.S. Department of Defense 
and Federal Civilian Work Force Statistics are used to estimate the total 
change in the number of civilian citizens affiliated with the U.S. govern- 
ment, and their dependents, who are overseas during the postcensal 
period. The natural increase (births, estimated from reports of Depart- 
ment of Defense hospitals) over the period is subtracted from this total 
change. The negative of the residual is taken as the estimate of net civilian 
citizen immigration over the period. Deaths in the overseas civilian citizen 
population are ignored. Also ignored are civilian citizens overseas who 
leave federal employment but remain overseas and civilian citizens living 
overseas who accept federal employment. 

“Other net emigration” refers primarily to persons not affiliated with 
the federal government who move from the United States to a foreign 
country. Since 1957 no statistics have been collected on the number of per- 
sons who have permanently moved out of the United States. Estimates are 
based on 1960-1970 data on overseas payments from the Social Security 
Administration and data reported to the United States by foreign coun- 
tries on numbers of immigrants into these countries. The Census Bureau 
assumes that the level of emigrants has remained constant since 1970 (see 
Warren and Peck, 1975). 


1.3 SOURCES OF DATA AND ERROR 

The net civilian immigration component is subject to greater error in 
estimation than natural increase or net immigration of armed forces for 
the following reasons: 

1. Illegal alien migrants are not identified as such, and no one knows 
what fraction of them are counted as residents. Good estimates of the ex- 
tent of such undocumented alien immigration are lacking, and the 
magnitude of this error is difficult to estimate. 

2. Net arrivals from Puerto Rico are estimated from airline passenger 
data. The determination of a small net flow from large gross flows in and 
out of Puerto Rico of approximately equal magnitude (residual process) is 
not conducive to accurate measurement. Net immigration from other U.S. 
possessions is not estimated at all. 

3. Estimates of emigration are markedly understated because statistics 
on permanent arrivals from the United States are provided by few coun- 
tries, and when they are available, data are generally poor and variable in 
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coverage. The official estimates of emigration in the 1970s are 36 
year, but Warren and Peck (1975), using demographic analysis, e: 
that over 100,000 persons from the foreign-born population emigi 
year. The total number of emigrants, native and foreign born, 
higher. 

Error in the estimate of population at the time of the censu 
significant for estimating total population. Net undercoverag< 
decennial census has received substantial discussion (see Burea 
Census, 1973a). Recognition of the undercoverage problem has b 
development and use of the inflation-deflation method, discussed 
section 1.5. This method reduces the impact of undercoverag( 
estimates of postcensal population change of age groups but doe 
feet the estimates of postcensal change for the population as a v 

Birth underregistration is believed to be small (see Bureau of 
sus, 1973c). 

1.4 STRUCTURE OF ERRORS 

The errors in the national components of change and in the total 
population estimates are important for subnational estimates bee 
small-area estimates are constrained by the national estimates ir 
ways. For example, the national estimate constrains the sub 
estimates by broad age groups. Postcensal estimation of state pof 
involves separate estimation of the population aged 65 and ow 
estimates in each state are scaled so they sum to the national esi 
population 65 and over. Since age composition varies from state 
error in the estimate of the national population 65 and over affec 
differentially. 

1.5 APPORTIONMENT OF NATIONAL POPULATION BY AGE, RP 

Postcensal estimates of national population by age, race, and sej 
tained by using a method called inflation-deflation. First, the 1 
mate of total population including military overseas is adjusted 
mated census undercoverage by age-race-sex class. The under 
rates are based on set D of the Bureau of the Census (1973a). Tf 
forces are assumed to be completely counted. Second, births 
deaths) are adjusted for underregistration by race-sex. This is th 
tion” part of the procedure. 

Next, the components of population change are broken down 
race-sex categories. The following methods and data sources are 
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1. Resident births, deaths: National Center for Health Statistics (nchs) 
gives data on age-race-sex. 

2. Deaths to armed forces abroad: Data on total military deaths are ob- 
tained from each branch of the armed forces. The Census Bureau esti- 
mates deaths to armed forces overseas by assuming that the proportion of 
these latter deaths for each state is the same as the proportion of total 
military deaths for the state. 

3. Alien immigrants: Immigration and Naturalization Service has data 
on age, sex, and country of birth. Race is apportioned according to the 
pattern observed in the 1970 census for immigrants by country of birth 
over the period 1965-1970. 

4. Parolees and refugees: Cubans were assumed all white, with age-sex 
distributions the same as Cubans in point 3. Classification of Vietnamese 
was based on counts of the hew Task Force on Indochina Refugees. 

5. Net arrivals from Puerto Rico: Puerto Ricans were assumed all white 
and are currently classified by age-sex according to the age-sex distribu- 
tions for Puerto Ricans living in the United States in 1970 (based on cen- 
sus data). In the years prior to 1977 the distributions were based on 
surveys of inmigrants and outmigrants from Puerto Rico. 

6. Civilian citizen immigrants affiliated with the United States: These 
persons are distributed according to the observed 1970 census age-race- 
sex distribution of this population overseas. 

7. Other emigrants; Social security beneficiaries are assumed to be over 
65 and are classified by race-sex on the basis of social security data. 
Canada provides the Bureau of the Census with age-sex-race distributions 
of American migrants to Canada. Migrants to other countries are as- 
sumed to have the same age-race-sex distributions as migrants to Canada. 


The final step consists of “deflating” the estimates of each age-race-sex 
group by multiplying each estimate by the corresponding undercoverage 
rate in the 1970 census. The same rates (set D of the Bureau of the Cen- 
sus, 1973a) are used to deflate as were used to inflate, but the rates are ap- 
plied to age groups rather than cohorts. For example, if and were 
the estimated 1970 undercoverage rates for white male children aged 5 
and 10 in 1970, then in estimating the 1975 population the 1970 base 
population of white males aged 5 would be inflated by (1 — i? 5 )~*. How- 
ever, the estimate of persons aged 10 in 1975 based on the cohort aged 5 in 
1970 would be deflated by 1 — /?io. 

After each age-race-sex class is deflated, further adjustment forces the 
total over subgroups to equal the national total obtained without inflation 
or deflation. The adjustment is necessary because inflation-deflation is 
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consistent for age groups but not for cohorts. Finally, the overseas i 
component is subtracted from the national total. 

The rationale of the inflation-deflation method flows from the a 
tion that undercoverage rates for age groups are stable over tin 
ultimate purpose of the method is to provide accurate estimates < 
censal population change by age, and the strategy for achieving thii 
to preserve in the postcensal estimates the 1970 undercoverage st 
for age groups. As a result, the postcensal age distribution does noi 
the 1970 undercoverage structure for cohorts. For example, the est 
numbers for immigrants not present in 1970 (hence not subject to 
count) are nonetheless deflated. In the absence of inflation or de 
direct application of the age-specific rates of change (birth, death, 
tion) to the various age groups would preserve in the postcensal es 
the 1970 undercoverage structure for cohorts but would not prese 
undercoverage structure across age groups. 


PART 2 STATE POPULATIONS 
2.1 INTRODUCTION 

The Census Bureau derives postcensal estimates of state populat 
averaging the results of three methods: component method II ( 
ratio-correlation method (rc), and administrative records methoc 
These methods have the following features in common: (1) Curre 
are used to estimate population change since the previous census (< 
a recent postcensal estimate). (2) Change in the 65 and over popul 
estimated separately, by using Medicare enrollment data. (3) Ch; 
the population living in group quarters is treated separately. 

Both CM II and ar are component methods. In using these me 

postcensal population = base population 

+ births 
— deaths 
+ net migration 

+ changes in group quarters populati 

' Except for the provisional estimates, which are typically based on just two method; 
ample, the Census Bureau made the provisional estimates by adding to the revi 
estimate the average change between 1975 and 1976 for component method II at 
variable ratio-correlation estimate. In addition, component method II was not 
estimating Alaska population beginning with 1975 (see Bureau of the Census (1' 
discussion). 
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The principal components of population change are net natural increase 
(births minus deaths) and net migration. ^ Special populations, i.e., those 
living in group quarters, are handled separately because changes in size of 
the special populations are not adequately reflected in the data used for 
the rest of the population. State special populations have included 
residents of military barracks, large Job Corps centers, institutions (men- 
tal hospitals, correctional facilities, etc.), college dormitories, and (for 
1975) Vietnamese in resettlement camps. 

Both AR and cm ii estimate net natural increase similarly and migration 
differently. To estimate migration, cm ii uses school enrollment data for 
internal migration and immigration, while ar matches Internal Revenue 
Service individual income tax returns for internal migration and treats im- 
migration separately. 

In the ratio-correlation method (rc), regression equations are used to 
relate population changes to changes in symptomatic data or indicator 
variables (see Morrison, 1971; Purcell and Kish, 1979). The rc method 
proceeds in two steps: (1) construction of a regression equation using data 
from a base observation period and (2) use of this equation to estimate 
postcensal population from current symptomatic data. 

To describe the methods in detail, it will be useful to develop notation, 
which will be introduced as needed, with a summary appearing as a 
special note at the end of Appendix A. A convention in use here is that a 
person who would be 65 or older on the estimate date is “elderly”; all 
other persons are “young.” Methods cm ii, rc, and ar will each be 
discussed in turn, with attention to methodology, sources of data and er- 
ror, and error structure. 


2.2 cm ii methodology 
2.2a Introduction and Overview 

It is convenient to let T refer to time in years, with T = 0 the time of 
reference of the previous census and T — t the time of reference of the 
present estimate. The interval (Tj, Tj] is the period since time up to 
and including time 72. 

^From 1970 to 1975 the change in the total population of all the states increased about 8.6 
percent from births, decreased about 5.0 percent from deaths, and increased 1.2 percent 
from net migration. These rates vary substantially among the states with respect to net 
migration, ranging from —8.1 percent in the District of Columbia to +20.8 percent in 
Florida {see Bureau of the Census, 1976). 


RESPOP(i) (postcensal population) 

= popy(O) (April 1, 1970, young population) 

-f bir(0, t) (births) 

— deay(0, t) (deaths to young) 

+ ngqmigy(0, t) (net migration of non-group quarter 
persons) 

+ gqpopy(0, t) (net change of group quarters young) 

+ netmovy(0, t) (net movement of young from militai 
quarters to non-group quarters) 

+ pope( 0 (elderly population), 

where (the following notation refers to a particular state) 


RESPOP(r) 

pope(T) 

popy(7’) 

BiR(ri, T 2 ) 
DEAY(ri, T2) 
NGQMIGY(ri, T2) 


GQPOPY(Ti, T 2 ) 


NETMOVY(ri, T2) 


resident population at time T\ 
resident elderly population at time T\ 
resident young population at time T\ 
number of resident births in (Tj, T 2 ]‘, 
number of resident deaths to young in (Tj , I 
number of young persons newly taking up nc 
quarters residence in the state over interval 
minus the number of young moving out fri 
group quarters in the state either to another s1 
group quarters in the state over interval (Ti, 
number of young persons newly taking u 
quarters residence in the state over interval 
minus the number of young moving out fro 
quarters in the state either to another state 
non-group quarters in the state over inter 
rjl;" 

excess of young persons moving out of milil 
racks in the state over those moving into mili 
racks in the state over (T^, T 2 ]. 


^The model used by the Census Bureau is slightly different in description but eq 
operation to that described here. In particular, the Census Bureau estimates the t 
quarters young population at time t and adds this to the estimated April 1, 1970, ) 
group quarters population (= popy(O) minus group quarters young on April 1, 1 
the other components (bir(0, t), etc.). 

“'For state population estimates produced in the first half of the decade this compo 
red to the net movement to the armed forces from the civilian populations rather 1 
military barracks population from non-barracks populations. 
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More notation will be introduced as needed. When it is necessary to be 
precise in referring to a particular state i, the argument Tor Tj , Tj will be 
replaced by T; / or Tj, Tj', i, for example, RESPOP(r; i) or bir(T] , T 2 ', i). 


2.2b The Elderly Population: POPE(t) 

To estimate POPE(t), an estimate of the change in the number of elderly is 
added to the count of pope(O). The change in the elderly population from 
time 0 to tis based on the change in the number of Medicare enrollments. 
Since almost the entire population 65 and over is enrolled in Medicare, the 
change in number of enrollments in a state reflects both increases from in- 
dividuals just turned 65 and inmigration of elderly persons and decreases 
from deaths and outmigration of elderly persons. 

The Medicare data base is discussed further in section 3.1. Because 
time 0 refers to April 1 while the Medicare data refers to July 1, the 
Medicare enrollments for time 0 are estimated by linear interpolation. 
Thus, for example, change in Medicare enrollments for a state over the 
period April 1, 1970, to July 1, 1974, is estimated by 

medcare(74) = {.25medcare(69) + .75medcare(70)}, 

where medcare(x) is the count of Medicare enrollments for the state in 
year x. 


2.2c Births, Deaths to Young: bir(0, /), deay(0, t) 

Estimates of these two components of natural increase are based primarily 
on data obtained from state vital statistics offices. These reports of deaths 
give breakdowns by race but not by age. To estimate deay(0, t), it is 
necessary to differentiate deaths to persons under 65, and for this. Na- 
tional Center for Health Statistics (nchs) data are used, since nchs pro- 
vides age by race breakdowns of total national deaths. 

Estimation of deay(0, t) will be described for times t, prior to 1979. 
Beginning with the 1979 population estimates the Census Bureau will 
estimate deaths to the young directly on the basis of reported deaths by 
age by state, and the following procedure will no longer be used. Some 
temporary notation will be useful: let subscripts /*, a, i refer to race, age, 
state and let the argument x refer to the year ending on December 31. 
Race r takes on two values (white, black and other), as does age a (young, 
elderly). Consider the notation 
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reported number of deaths to race r in state i for year x (obta 
from state vital statistics offices); 

NCHS estimate of the nationwide death rate for persons of ra 
and age group a over the interval (0, t]; 

Prai count of race r, age a population of state i on April 1, 1970. 

The reported number of deaths to race r in state i over the period (0, 
denoted by Because jc refers to December 31 while time t refers to 

1 and time 0 refers to April 1, some interpolation is used to obtain 
The estimates are also adjusted to the national total. For exan 
with t referring to July 1, 1973, satisfies 


D,, = K[JSD,im + D,(71) + D,(72) + .50D,(73)], 


where K is chosen so that the sum of D„- over states equals the nati 
total. 

The estimate of deaths in state i over (0, t] to persons of race r and £ 
will be denoted by To obtain this estimate, the estimates are 
portioned among the two age groups by using the national death rat( 


d P 

D. = 2) . C5__r«< 


" A p fj p 

^ry-^ryi ' “re-* re/ 


where a takes on the values e (elderly) and y (young). 

The estimate deay(0, t; i) is then obtained by summing the estimat 
the young deaths: 


deay(0, t; 0 = 1^ D,yi. 

r 


The estimation of births bir(0, t) is easier because all newborn 
young. Estimates of births provided by a state are nonetheless contri 
to national totals. Let 

B;(x) state-provided estimates of births for state i in year x\ 

Bi unadjusted estimate of births for state i over the interval (0, t 
B NCHS estimate of the total number of births to U.S. residents 
the interval (0, t\. 

The estimate B,- is obtained by interpolation. For example, with t refei 
to July 1, 1973, Bi satisfied (for the revised estimates)^'' 

^For the provisional estimates, B,- satisfied B; = .755,(70) -I- B,(71) 1.5B,(72}. 


The estimates of bir(0, t\ i) are then obtained by adjusting the B, to the 
national total; thus 




2. 2d Changes in Special Populations: gqpopy(0, t) and netmovy(0, t) 

To estimate the change in special, or group quarters, populations of the 
young, the Bureau of the Census assumes that there is no net interstate 
movement of young persons living in group quarters, except for areas con- 
trolled by the federal government, including barracks populations of 
military installations, Job Corps centers in six states, and refugee camps 
for Vietnamese (in four states in 1975). In estimating substate popula- 
tions, additional special populations are considered. 

The net movement component is estimated on the basis of changes in 
the size of the total population living in barracks (including those in 
foreign countries). The total number of persons leaving the barracks is 
allocated among the states according to the state distributions of preser- 
vice residence reported in U.S. Department of Defense records. 


2.2e Non-Group Quarters Migration of the Young: ngqmigy(0, t) 

An essential element of cm ii is the use of school enrollment data to esti- 
mate ngqmigy(0, t). The method will first be sketched and then described 
in detail. 

An estimate of the school-age population for time t is obtained by 
relating the school-age population to elementary school enrollment at time 
T = 0 (April 1, 1970) and applying this relationship to the school enroll- 
ment at time t. This estimate of the school-age population is then com- 
pared with the “survivors” of the school-age cohort (“expected” cohort 
size if there were mortality but no migration in (0, t]). The migration of 
the school-age population is estimated by the resulting difference between 
the estimated school-age population and the survivors of the school-age 
cohort. Dividing the estimated school-age migration by the school-age 
cohort minus one-half the deaths to the school-age cohort produces an 
estimate of the school-age migration rate. This estimated school-age 
migration rate is then adjusted to a migration rate for the young female 
population. The migration rate for the non -group quarters young popula- 
tion is assumed equal to the rate for young females and then is applied to a 
base population to yield the estimate of ngqmigy(0, t). 

The following approximations underlie the method: (1) Children start 
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first grade in the calendar year of their sixth birthday. (2) No childr( 
or skip one of grades 1-8. (3) No children move or drop out durir 
school year while in grades 1-8. (4) Within each state, the proport 
children aged 6.25-14.24 on April 1, 1970, who are enrolled in g 
1-8, is constant over time. (5) The difference between the average a 
migration rates of young females and of children aged 6.25-14.24 rei 
constant over time in each state. (6) The migration rate for young fe 
in a state equals that of the non-group quarters young population. 

The method proceeds in the following manner: 

Step /. Obtain school enrollment data directly from a state sou 
estimate enrol(O), and ENROL(t), where enrolCT) is the sum total 
children in the state enrolled in the fall^ for grades 1-8 in public, pi 
or special education schools for the school year beginning in the fall 
calendar year preceding time T. 

Step II. Estimate school-age population sclpop(0 according to 

, , SCLPOP(O) 

scLPOP(t) = X enrol(/:), 

enrol(O) 

where sclpop(T) is the school-age population, precisely, the popu 
aged 6.25-14.24 on April 1 of the calendar year containing time T. 

Step III. Scale the estimates of scLPOP(t) so that they sum to tl 
tional estimate (described in Part 1). 

Step IV. Estimate “expected” school-age population exsclp 
where ExscLPOP(r) is the “expected” school-age population at tim 
there were births and deaths but no migration over the period (0, T] 
estimate is made by adjusting the cohort counted in the 1970 censi 
births and deaths over (0, T]. Births and deaths are estimated 
reported calendar year vital statistics. To allocate deaths to the scho( 
population, the national period death rate is applied to the scho( 
cohort. Denote the deaths to the school-age population over the f 
(0, T] by scLDEA(r), denote the number of children born since th 
census who attain school age by time Tby sclbirCJ), and denote th 
of this cohort at time Tby sclchtCT). Thus exsclpopCT) = schlc 

+ SCLBIR(r) — SCLDEA(T). 

Step V. Estimate the school-age migration rate scLMiGRAT(i) a< 
ing to 


SCLMIGRAT(t) = 


SCLPOP(t) — EXSCLPOP(t) 

SCHLCHt(O) — V'2[sCLDEA(r) — SCLBIR(t)] 


^For some states, April enrollments are used. 
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where scLMiGRATCr), T > 0, is the period net migration rate for popula- 
tion aged 6.50-14,49 at time T > 0, over the interval (0, T], 

Steps VI- VIII, below, are designed to provide an estimate of the non- 
group quarters young migration rate NGQMiGYRAT(t), where ngqmi- 
gyratCT), J > 0, is the period net migration rate for the non-group 
quarters young population at time T > 0, over the interval (0, T]. 

To relate the migration rates for the school-age population to the young 
non-group quarters population, data for the period preceding T = 0 are 
used. Migration rates for young females are computed as an intermediate 
step to avoid difficulties attendant to military migration during this 
period. 

Step VI. Obtain estimates (using 1970 census data) of the school-age 
and young female migration rates sclmigrat(O), femigyrat(O), where 
sclmigrat(O) is the period net migration rate for population aged 
5.00-14.99 atT = 0 over the interval (—5, 0] (i.e., over the preceding 5 
years) and femigyrat(O) is the period net migration rate for young 
females at time T = 0 over the interval (—5, 0] (i.e., over the preceding 5 
years). The denominators of these period rates are the respective 1970 cen- 
sus populations. 

Step VII. Estimate the young female period net migration rate at time 

t, FEMIGyRAT(t), according to FEMIGYRAT(t) = SCLMIGRAT(t) + 
[fEMIGYRAt(O) — SCLMIGRAT(0)](t/5), where FEMIGYRAT(ir), r > 0, is 
the period net migration rate for the young females at time T > 0 over the 
interval (0, T\. 

Step VIII. Estimate the non-group quarters young period net migra- 
tion rate NGQMIGYRAT(t) by assuming NGQMIGYRAT(t) = FEMIGYRAT(t). 

Step IX. Estimate net non-group quarters young inmigration 
ngqmigy(0, t) by multiplying the estimate of the migration rate 
ngqmigyrat(0 by the estimate of the base population. Here the migra- 
tion base population is estimated according to ngqpopy(O) + V'2[bir(0, t) 
— deay(0, t) + netmovy(0, t)]. 

2.2f Final Adjustments to cm ii 

The estimates of births, deaths, and elderly population are all scaled by 
factors Xjy, X^,, and X£ to sum to the respective national estimates 
(discussed in Part 1), These factors are constant over the 50 states and the 
District of Columbia. The estimates of young population for each state, 

popy(O) + X£Bir(0, t) — X£)Deay(0, t) 

+ ngqmigy(0, t) + gqpopy(0, t) + netmovy(0, t). 
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are then scaled to equal the estimate of national young populati' 
changes in the estimated young state population brought about by 
scaling are all attributed to the estimate of ngqmigy(0, t). Estir 
the other components are not altered. 

2.3 SOURCES OF DATA AND ERROR IN CM II 
2.3a Elderly Population: POPE(t) 

Certain characteristics of the Medicare data are pertinent here. Cc 
files prepared annually as of July 1 by the Health Care Finance A 
tration contain the state and county of the residential mailing addi 
persons enrolled in Medicare. 

Problems arise in three areas: coverage, multiple addresses, a 
ing. At the national level. Medicare enrollment is generally about ( 
the 1970 census count of the elderly (Bureau of the Census, 1973b^ 
is some disparity, however, between the census counts and the enr 
figures for some states, particularly Florida and Arizona. Various 
are excluded from Medicare (e.g., aliens who have resided in the 
less than 5 years), and other groups are only partially include 
retired federal employees are incompletely registered). A minor c 
problem arises from timing. Because of legislative requirements, 
as of a given date contain not only those aged at least 65 but als 
who will turn 65 during the month following the reference dab 
changes in Medicare enrollments reflect more closely the change 
population over age 64-11/12 than those in the population 65 ar 
the impact of this is minor, however. 

The problem of multiple addresses occurs when an elderly perso 
tains residences in more than one state. Such a person may re 
original enrollment mailing address for Medicare purposes but b} 
definition be considered to be living at a residence in anothe 
Changes in the number of these persons would adversely affect t 
mates of the states’ elderly populations. 

The timing problem derives from the delay in preparing the cc 
files. In order to be included in the Medicare files, a person r 
64-11/12 years old by July 1. The computer files are not updai 
released, however, until about April 1 of the following calendi 
which is the closing date for new registrations or changes of £ 
Thus, for example, the actual reference date of the Medicare re 
record for the July 1, 1976, elderly population is closer to April 1 
than to the desired July 1, 1976. Changes in the number of perse 
classified would induce error into the estimates. 
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2.3b Births, Young Deaths: bir(0, t), deay(0, t) 

In making postcensal estimates of population prior to 1979, these two 
components of net natural increase for the young population were 
estimated on the basis of data provided by members of the Federal- State 
Cooperative Program (fscp). The fscp obtains its figures from the in- 
dividual state vital statistics offices, and some error arises when national 
mortality rates by age and race are used to apportion the fscp counts of 
total deaths in each state into estimates for the young and elderly popula- 
tions. Under the revised procedures used to estimate 1979 populations 
(see section 2.2c) this source of error will be eliminated. 


2.3c Change in Group Quarters Young: gqpopy(0, t) 

Two assumptions are used in estimating gqpopy( 0, f). The first is that ex- 
cept for the military barracks for which data are available, Job Corps 
centers, and refugee centers, there is no net migration of young group 
quarters residents into or out of the state. The error introduced by this 
assumption is believed to be small, since the change in number of out-of- 
state residents in college dorms is usually relatively small, as are changes 
in the number of residents living in barracks for which data are not 
available and other special populations whose changes are ignored (long- 
term inmates of hospitals and institutions). 

The statewide change in the size of military barracks populations is 
estimated by summing the changes estimated for all subcounty barracks 
populations. For the large (and some small) military barracks an estimate 
of the size of the barracks at time t is secured from the individual post 
commander, either directly or through a member of the fscp (see Bureau 
of the Census, 1973d, pp. 45-50). If these data are not available, data on 
the size of the whole installation are available, and the current barracks 
population is estimated to equal 


(current installation size) • 


(1970 barracks population) 
(1970 installation size) 


This alternative was used for about 12 states in 1976-1978 and for five or 
six states in 1979. 

The total change in group quarters young populations is thereby 
estimated to equal the movement in barracks populations. Job Corps 
centers, and refugee centers. This procedure utilizes the second assump- 
tion: the number of deaths in this subpopulation is zero. 



The possible sources of error here are described by reference to the ; 
in the procedure outlined in section 2.2e. 

Step I. School enrollment data are provided by members of the 
and by state education departments, based on figures to be supplie 
grade by county for public and nonpublic schools. The roughly one-t 
of the states vi'ho do not have public school fall enrollment data avail 
use year-end data. 

Nonpublic school enrollments are reported (1) in some states by g 
or by county by grade, (2) in some states for total kindergarten thn 
grade 8 (here the Census Bureau tries to subtract kindergarten er 
ment; these data, published in education directories, are not verj 
curate), and (3) in some states (such as Texas) only for some areas (in 
case the parochial and private schools must be contacted in order tc 
tain enrollment figures; often the parochial school data can be obta 
from a single source, but other private schools must be contacted or 
one by an fscp member or other means). Even when the states re 
private school enrollment, the Census Bureau screens the data. 

Step II. Surprisingly, the estimate of enrol(0 often exceeds th< 
scLPOP(t). The reasons include the following: (1) children have 1 
undercounted in the decennial census, (2) some children fail grades 
are too old to be included in the estimate of scLPOP(t), (3) stud 
enrolled in special programs may be counted more than once, anc 
children of migrants and children who transfer from one school to anc 
and are reported in both places are double-counted. 

Step VI. The 1970 census included a question about prior resideiK 
1965. These data were used in estimating the number of migrants ovei 
5-year period for young females and for the school-age cohort. 

2.4 ERROR STRUCTURE IN CM II 

The principal source of error in the estimates of postcensal change in ! 
populations resides in estimation of non-group quarters young migrs 
ngqmigy(0, t). Such error arises because (1) misreporting (or nonrej 
ing) of school enrollments introduces error into the estimates of the 
portions enrolled in school, (2) differential undercoverage in the decer 
census of the population under 14 adversely affects the estimates ol 
proportions enrolled in school, introducing error into the estimat 
scLMrGRAT(r) (see Step V above), and (3) the assumption and estims 


OT an unvarying nncai uiiaugc uerwecii ine migration rates tor scnooi-age 
population and for young females are only rough approximations. 

For a few states, notably Florida and Arizona, another significant 
source of error lies in the estimation of the change in the elderly popula- 
tion POPE(t) — pope(O). Errors arise from deficiencies in the Medicare 
data (see section 2.3a above). 

Error in the estimates of young deaths deay(0, t) is caused primarily by 
age and residence misreporting on death certificates. A smaller source of 
error lies in the adjustment to the national total by X^. 

Errors in the estimates of births, caused by underregistration and 
misreporting of residence, are believed to be insignificant. Errors in the 
estimate of group quarters young migration are also believed to be 
generally insignificant, since in most states a very small proportion of the 
population lives in group quarters. 

2.5 METHODOLOGY FOR RC 
2.5a Introduction and Notation 

Ratio-correlation (rc) is a regression method, in which a state population 
is divided into three parts; elderly, group quarters young, and non-group 
quarters young. The elderly and group quarters young populations are 
estimated as in cm ii. In the case of non-group quarters young popula- 
tions, RC uses regression equations to estimate the fraction of national 
non-group quarters young residing in each state. This fraction is then 
multiplied by the estimate of national non-group quarters young popula- 
tion, yielding an estimate of state non-group quarters young. 


2.5b Elderly Population 

The elderly component is estimated just as in the component method II 
(see section 2.2b above). 


2.5c Group Quarters Young Population 

The RC estimate of group quarters young population in the base year is ob- 
tained from the census count for April 1, 1970. To this is added an 
estimate of the change in group quarters young population (both in bar- 
racks and in nonmilitary group quarters), which is derived just as in the 
component method II (see section 2. 2d). Deaths to group quarters young 
are ignored. 
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2.5d Non-Group Quarters Young Population 

This component is estimated with the use of a regression equation, 
equation is obtained by the least-squares linear fit of the relative chai 
in the state shares of national non-group quarters young population f 
1960 to 1970 to the relative 1960-1970 changes in state shares of natii 
numbers of (1) students enrolled in elementary school, (2) federal 
dividual income tax returns, (3) registered passenger cars,^ and (4) 
sons in the work force. The regression model has the form 

4 

Yi = 19o + ^ ^r-^r / + residual, 

r=l 

where Bq, B^ are the coefficients (to be estimated), 1 

calculated by 


NGQPOPY(r, /)/ENGQPOPY(t; j) 

y 

ngqpopy( 0; /)/ENGQPOPY(0;y) 

j 

with ngqpopy(T; /) equal to non-group quarters population of state 
time T and having forms similar to T,- but with ngqpopy replace^ 
the predictor variables: observed numbers of students enrolled in elen 
tary school, federal income tax returns, etc.® 

The postcensal estimates of state non-group quarters young pop 
tions, for time t later than April 1, 1970, are obtained by using 
estimated regression equation from above and substituting for the pre 
tor variables the relative changes in shares of four components — stude 
tax returns, cars, work force— over the interval (0, t]. This yields 
estimate of the relative changes in the state shares of non-group quai 
young population. For each state this estimate is multiplied by the Apr 
1970, share of non-group quarters young population, to provide 
estimate of the state’s share of the national non-group quarters yo 
population for time t. These estimates are then extrapolated 3 month 
pertain to July 1 and scaled to sum to unity. Finally, these estimates 
multiplied by the estimate of the national non-group quarters yo 
population. 

’This data series was dropped from use beginning with the 1975 estimates. 
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.5e Total Resident Population 

he total resident population for a state can now be estimated by adding 
le estimate of non-group quarters young population to the estimates of 
roup quarters young population and of elderly population. 


.5f Complications in Regression Models 

or this section the term “population” will be used to refer only to the 
on-group quarters young population. The regression complications 
date to observed departures from the model of the predictor variables for 
5 me states. In particular, in almost every southern state the changes from 
960 to 1970 in the distribution of federal income tax returns, passenger 
ar registrations, and to a lesser extent, the work force reflect increased 
ffluence rather than changes in the state share of population only. Thus 
le deviations of regression-estimated non-group quarters young April 1, 
970, population from censal population counts have large positive values 
)r the southern states. This same phenomenon was observed for the 
950-1960 changes. 

For symptomatic data V{T) referring to date T, the methodology 
)cusses on “area coverage ratios,” defined for state i as 

„ ^ Vi(T)/Pi{T) 

^ V{T)/P{T) ’ 

dth notation 

IfT) area coverage ratio for variable V, state i, time T; 

^,(7’) value of variable V for state i, time T; 

’j{T) population of state i, time T; 

P(T) LPjiT)-, 

j 

nT) EV/D. 
j 

'o improve the regression model, it is worthwhile to remove the effect of 
rends in the area coverage ratios. The “expected coverage ratio” for 1970, 
!/(70), is then calculated as follows: 

1. If i?,(50) < /?/(60) < 1, then jR, '(70) is established by linear ex- 
rapolation of the 1950-1960 trend, with a value of 1 as the upper limit; 
.e., Ri’ilO) = min [1, 2/?, (60) - /?,(50)]. 

2. If iR,(50) > /?/(60) > 1, then Ri'(70) is established by linear ex- 
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trapolation of the 1950-1960 trend, with a value of 1 as the 
i.e., i?,'(70) = max [1, 27?, (60) - i?,(50)]. 

3. Otherwise, set/?, '(70) = /?,(60). 

If the trends in area coverage ratios are not being considere 
predictor variable appearing in the regression equation for 
1960-1970 population change will be 

V,-(70)/V,(60) 

V(70)/y(60) 

To account for the trends this variable is replaced by 

y/(70)/V,-(60) 

V'(70)/y(60) ’ 

where V,'(70) = y,(70)-/?,(60)//?,' (70) and V'(70) = DV/C 
This replacement is in fact made for variables 2, 3, and 4 (s' 
when the regression coefficients are estimated. To apply th 
regression equation for estimation of postcensal population at 
than April 1, 1970, each of the symptomatic variables corresf 

y,-(t)/y,(70) 

V{t)/V{10) 

is replaced by 

V,'(t)/V,'(70) 

y'(O/y'(70) ’ 

where Y'[t) = ZVj'{t) and V'/(t) is calculated as follows. Fi 
pected area coverage ratio” /?, '(80) is calculated analogously t 
/?,(60) < /?,.(70) < 1, then /?,'(80) = min [1, 2/? ,(70) - 1 
Then /?,'(t) is calculated by linear interpolation between 
/?,'(80), and Vi'it) = y,(t)./?,.(70)//?,'(r). 

This use of area coverage ratios has been applied only for va 
Discussion can be found in the work of the Bureau of the Ce 
pp. 10-14). 

2.6 SOURCES OF DATA AND ERROR IN RC 


2.6a Elderly Population 
See section 2.3a. 
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.6b Group Quarters Young Population 
ee section 2.3c. 


,6c Non-Group Quarters Young Population 

lata on school enrollments were discussed in section 2.3d, 

Information on individual income tax returns is made available to the 
ensus Bureau by the Internal Revenue Service. 

Data on passenger automobile registration are provided by the State 
lepartments of Motor Vehicles and published by the Bureau of Public 
Pads in Highway Statistics. 

Data on the numbers of nonagricultural wage and salary workers are 
rovided by the U.S. Department of Labor and published annually in the 
lay issue oi Employment and Earnings (see Bureau of Labor Statistics, 
978, p. 158, pp. 124-133), Estimates of the number of full-time 
gricultural workers are based on data provided by fscp members, 
fnemployment figures are currently obtained from the Bureau of Labor 
tatistics, which bases its figures on unemployment insurance data. 

.7 ERROR STRUCTURE IN RC 

lost of the error in rc estimates of change in state populations derives 
om estimation of change in non-group quarters young population. This 
rror arises in turn from error in the symptomatic data and from inade- 
uacy of the regression model. Specifically, the model may fit well for a 
revious time period but predict poorly over the postcensal time period, 
he methodology discussed in section 2.5f applies only to known past 
epartures from the model and not to current departures. 

The comments in section 2.4 about error in estimating elderly and 
roup quarters populations apply here to error structure of rc as well. 

.8 METHODOLOGY FOR AR 

.8a Introduction and Overview 

'he administrative records method (ar) is a relatively new variation of the 
omponent method for making postcensal population estimates. The 
omponents of population change are derived analogously with compo- 
ent method II (cm ii), except for net migration. The elderly and special 
^roup quarters) populations are handled separately, and natural increase 
5 estimated identically. Net migration, however, is decomposed into net 
iternal migration and immigration from abroad. To estimate net internal 
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migration, individual federal income tax returns are matched 
years, and address changes noted. Immigration from abroad 
from the records of the Immigration and Naturalization Ser 
immigrants’ intended places of residence. 

Another difference between ar and cm ii lies in the “base y 
estimate change. While cm ii always calculates changes since 
census, ar calculates shorter (usually year-to-year) changes. I 
in looking at shorter changes is the effort to obtain high ma1 
the income tax returns. This will be explained further in the fc 
tions. 


2.8b Net Internal Migration 

This component is estimated by computing a net migration i 
state, based on state of residence reported on individual fed 
tax returns for 2 years, and then applying this rate to the estir 
non-group quarters population. To develop 1973 postcensal ej 
migration rate from 1970 to 1973 was estimated from the 
calendar years 1969 and 1972 tax returns. The 1974 estimate; 
on returns filed in April 1973 and 1974, and the 1975 estimate 
on returns filed in April 1973 and 1975. The 1976, 197' 
estimates were based on returns filed in April 1975 and 197 
1977, and 1977 and 1978, respectively. 

The tax returns contain, for each filer, social security numl 
number of exemptions, and number of exemptions for blind 
old age. 

For each of the calendar years when the tax forms were u 
puter file was constructed to retain the relevant information 1 
returns. The returns were arranged in ascending order of the 
ity number of the primary taxpayer. 

No match is possible when the social security number oi 
return does not appear in the file for the other year. Reason 
elude the following: death; marriage; failure to earn sufficiei 
require filing; immigration from abroad; first entry into the 
divorce, separation, or widowhood (which may result in filing 
social security number); and decisions by spouses to file joint! 
but separately in another. A valid match can only occur 
security number of the primary filer appears in both files. W1 
of residence^ is the same for both years, the filer (and any pei 

^ A question about state of residence appeared in the 1972 and 1975 returns. I 
imputation procedures utilizing the mailing address on the return are used 
state of residence. 


Liffers, the filer (and any person claimed as an exemption) is classified as 
n interstate migrant. 

Because the elderly population is handled separately in ar, it is advan- 
ageous to exclude the elderly from the calculation of the non-group 
[uarters young migration rate. Consequently, if any exemption is claimed 
or old age or blindness (the two are not distinguishable in the computer 
ile), the entire tax return is excluded from consideration (i.e., treated 
quivalently to a nonmatched return). 

On the basis of the remaining matched returns the migration rate is 
omputed as 

/ number of exemptions on\ _ / number of exemptions on out-\ 

\ inmigration returns / \ migration returns / 

/number of exemptions^ /number of exemptions on\ 

\ on nonmover returns / \ outmigration returns / 

Inhere “number of exemptions” refers to the tax return for the later of the 
1 years. Except for minor complications (discussed in the following 
laragraph), this rate is multiplied by a population base equal to the 
lumber of young persons at the beginning of the period plus one-half the 
um of natural increase plus net movement plus net immigration from 
-broad over the period. This product is the estimate of net non-group 
[uarters young internal migration. 

The possible complications in thus calculating the migration rate have 
leen described by the Bureau of the Census (1976, p, 12) as follows: 

Since migration patterns of young adults often differ from the remainder of the 
>opulation, a migration adjustment factor distinct for each State was introduced, 
rhe rationale for the adjustment is that young adults are not represented on 
Hatched returns in proportion to their population. Accordingly, by reasoning 
.nalogous to that previously discussed in Component Method II, the net migration 
ate for the 10-year period 1960-70 was calculated for females under age 65 in 1970 
nd was compared to that of the subgroup which excluded those 18 to 24 in 1970. 
'he algebraic difference between the two rates was the 10-year adjustment. For 
horter periods the migration adjustment differential was prorated. At the State 
3vel, the annual adjustments range from —0.2 percent for West Virginia to +0.2 
lercent for Utah. The District of Columbia, however, receives an annual adjust- 
nent of +0.6 percent. 


!.8c Immigration From Abroad 

mmigrants from abroad are not detectable by the matching technique 
lecause they file tax returns only after entering the United States. The 


estimated national number of immigrants is allocated to states i 
to the immigrants’ declarations to the Immigration and Natu 
Service. Emigrants are ignored. Parolees (see section 1.2c) recei 
treatment. 

2.8d Other Components of Change 

These components include natural increase of young, changes 
populations, and changes in group quarters populations and 
mated as they are in cm ii. While cm ii considers changes over 
(0, t]], (0, ^ 2 ] (see section 2.2), ar focuses on {ti, ^ 2 ]- To estima 
over the interval {ty, tj], as. simply uses the difference between 
estimates of change over (0, and (0, t 2 ]^^ 

The change in state population is then estimated by summin 
increase, changes in group quarters populations, changes in tl 
population, net internal migration, and immigration from abr 
estimates of change in state population are scaled so their sum e 
change in the estimates of national population. As with cm ii (si 
2.2f), the changes in the estimated young state population brouj 
by this last scaling are all attributed to the estimate of net interr 
tion of the non-group quarters young. 

The postcensal estimates of state population are then obtaine 
ing these estimates of population change to the estimates of pop 
the base year. 

2.9 SOURCES AND STRUCTURE OF ERROR FOR AR 

Since postcensal estimation of state population under ar differs 
II with respect to the migration component only, the focus here 
the use of individual federal income tax returns to estimate n 
The methodology rests on two implicit assumptions: 

1. Migration patterns are the same for people who file in 
returns as for those who do not (except for elderly and special po] 
which are treated separately). 

2. The address listed on the tax form for each year is 1 
residence and is the relevant address for determining whether < 

‘^Beginning with the 1978 estimates, the Census Bureau computed deaths to thi 
to the elderly over (t 1 , t 2 ] directly rather than by taking differences between thosi 
and (0, t 2 }. The two procedures are not equivalent because the cohorts of young ( 
at times f i and ti were different, and what is really of interest are deaths over (i 
cohort defined with <2 as the reference date. 
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person has moved. For example, the filer could report place of residence 
one year and place of business in another year. 

The extent of error arising from failure of assumption 1 is not known. 
About 99 percent of whites under 65 are included as exemptions on the tax 
returns, but the filing rate for blacks under 65 is lower. Blacks in the 
southern states have exceptionally low filing rates. Also, numerous low- 
income persons are not included as exemptions, when the head of house- 
hold fails to file a tax return. Further discussion is found in the work of 
the Bureau of the. Census (1978a, pp. 4, 6). 

Estimates of net internal migration to New York are probably under- 
stated because the matching-based estimates of Puerto Rico-New York 
migration are based on underestimates of Puerto Rico-New York migra- 
tion but more accurate estimates of New York-Puerto Rico migration. 
This arises from peculiarities in the tax laws as they affect Puerto Rico. 
Persons living in Puerto Rico typically do not need to file an Internal 
Revenue Service (irs) individual income tax return and so will not be 
matched if and when they migrate to New York. On the other hand, most 
Puerto Ricans returning from New York to Puerto Rico will probably file a 
tax return with ms (to recover withholding taxes), giving Puerto Rico as 
place of residence. 

Errors in postcensal estimates of state populations connected with the 
estimates of net migration from abroad arise from (1) errors in allocating 
the immigrants to the correct states, (2) errors in the estimate of the total 
number of immigrants from abroad, and (3) treatment of emigrants by 
foreign countries. Discussion of point 2 as a source of error can be found 
in section 1.3 above. The effect of point 3 is complicated because of the 
adjustment of total migration to the national control. 


PART 3 COUNTY ESTIMATES 

3.1 INTRODUCTION 

Postcensal estimates of county populations are calculated by methods 
generally similar to those discussed in Part 2. Other methods may also be 
utilized at the substate level because some states prepare their own esti- 
mates. These are scaled to sum to the Census Bureau’s estimate of the 
state total and then averaged with the Bureau’s estimates of substate pop- 
ulations. 

It is important to distinguish among three sets of county estimates: 
“provisional,” “preliminary” or “ors” (for Office of Revenue Sharing), 



and “revised.” Provisional estimates are made roughly 6-12 moi 
the reference date for the estimates, the revised estimates abo 
later, and the ors estimates sometime in between. 

Because the provisional estimates are made before the Internal 
Service (ms) tax return data are available, these estimates do m 
the administrative records method (ar). Rather, component n 
(cm ii) is used to estimate the population change over the year ] 
the estimate date t, by calculating the difference between the c 
mates for t and t — 1 . In the case of large metropolitan counties 
ing unit method is generally also used to estimate the 1-year pi 
change. For these counties the estimates of change from the hoi 
method and cm ii are averaged. The derivation of the provision 
estimates may be represented symbolically as provisional estim 
revised estimate {t — 1) + change over {t — \,t], where change 
1, r] is estimated either by the change in cm ii estimates alone i 
r — 1 to date t or by the average of the changes in cm ii and th 
unit method estimates from I — 1 to I. In several states (18 for 
estimates and 16 for 1976) other methods supplant the hou 
method in computing the provisional estimates. 

Generally, the ors estimates are derived according to 

ORS estimate it) = revised estimate (t — 1 ) + change over (r 

where change over (r — 1 , i] is estimated by the average of chang 
and AR estimates from t — 1 to t. In some states, additional me 
averaged to estimate change over {t — 1, r]. However, the Censi 
requires that estimates within a state be the product of a 
methodology, so additional methods are averaged only if the 
estimates for all counties in a state. Thus for the 1975 ors estir 
housing unit method was used in only one state (Florida), where 
ing unit method estimates were available for all counties withe 
tion. 

When the results of a special census are available for a county 
used instead of the various postcensal estimates. In this case tl 
ment of county estimates to sum to the state estimate follow 
plicated procedure, which we will refer to here as “rake/flo 
procedure is discussed below in section 4,2 for subcounty estim 
procedure for county estimates is analogous and will not be 
given. 

The notation and conventions introduced in Part 2 will be r( 
the present and subsequent chapters. 

For the July 1, 1975, ors county estimates for Florida, the cl 
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stimated by a three-way average of changes in ar, cm n, and housing 
init method estimates. (For other exceptions, see Bureau of the Census 
1980).) For Kansas, Missouri, Nebraska, and Washington July 1, 1975, 
>RS county estimates, the change was estimated by the average changes in 
:m II, AR, and rc estimates. For California, four estimates of change were 
veraged: cm ii, ar, rc, and the driver’s license address change method 
DLAC). 

Revised estimates of county population differ in structure from both 
irovisional and ors estimates. The procedure in making the revised 
stimate for date t does not employ the revised estimate for date t — 1 ex- 
ilicitly. Rather, cm ii, ar, and rc are each used directly to estimate the 
lopulation as of date t (in ways similar to those described in Part 2). In 
ome states a fourth method is used as well. Each method’s set of county 
stimates is scaled to sum to the state total, and then the three (or four) 
stimates for each county are averaged. This procedure yields the revised 
stimates of county populations. 

The various methods are described below. 


1.2 driver’s license address CHANGE METHOD 

rhe driver’s license address change method (dlac) is a component 
nethod used by California to estimate county populations. The estimates 
ire constructed in the following manner: to the base population estimates 
ire added estimates of natural increase, plus change in the elderly popula- 
ions (estimated from changes in Medicare enrollments), plus changes in 
nilitary barracks, plus net migration. The distinguishing feature of dlac 
s the way in which net migration is estimated. 

Net interstate migration of the population aged 18 to 64 is estimated 
ising address changes in the California Driver’s License File. Persons out- 
lide this age range are not well represented, and their migration is esti- 
nated separately. Immigration from abroad is also estimated separately. 
^ variation of cm ii is used to estimate net migration of the population 
inder 18. Migration of persons over 64 is implicitly included in the 
istimate of changes in the elderly population. Further detail is given by 
Jiasmussen (1975). 

5.3 housing unit method 

rhe state-prepared county population estimates in Florida for 1975 were 
5ased on the housing unit method (hum). In this method an estimate of 
he number of occupied housing units is made and multiplied by an 
istimate of the average number of persons per household. To this product 


Starsinic and Zitter (1968) and Pittenger et al. (1977). 


3.4 COMPONENT METHOD II 

The use of cm ii for counties essentially parallels that for states (see sec- 
tion 2.2 above), with exceptions noted in the following descriptions of the 
methods used in connection with each component. 


3.4a Elderly Population 

This component is estimated just as at the state level (see section 2.2b 
above). 


3.4b Special Populations 

Because group quarters populations may account for a more significant 
share of a county’s population than of a state’s population, these popula- 
tions are estimated more painstakingly at the county than at the state 
level. The group quarters populations considered at the county level in- 
clude inmates of prisons or of long-term hospitals, college students living 
in dormitories, residents of Job Corps centers, and members of the armed 
forces living in military barracks. 

For these special populations, annual observations are obtained and net 
changes over the year are computed. Net movement of the barracks 
populations for counties is estimated by allocating the state total among 
the counties, according to the 1970 census distribution of males aged 
14-17. 


3.4c Births and Deaths to Young 

These components are estimated analogously to their state-level counter- 
parts, with two major differences. The first difference is that at the county 
level the Census Bureau does not use reported county deaths by race. The 
second difference is that the reported births and deaths for the counties 
are not adjusted to the state total (which had been adjusted to the national 
total). Thus births are estimated simply by obtaining the number of 
reported births for each county from state vital statistics departments 
through members of the Federal-State Cooperative Program (fscp). 

Young deaths for counties over the interval (0, t] are estimated as 
follows: Let subscripts r, a, i, and j refer to race, age, state, and county 
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nd let the argument x refer to the year ending December 31. Race r takes 
n two values (white, black and other), as does age a (young, elderly). 
>efine 

),y(jc) reported number of deaths for county j, state i, year x (obtained 
from state vital statistics departments); 
dra nchs estimate of the nationwide period death rate for persons of 
race r and age group a over the interval (0, t]-, 

P^^ij count of race r, age a population of county j in state i on April 1, 
1970. 

’he estimated number of deaths for countyy , state i over the interval (0, t] 
dll be denoted by Z)y and is obtained by summing D^fx) over time periods 
and interpolating at the ends of the interval. For example, with t refer- 
ing to July 1, 1973, Z),y satisfies 

D.j = .75Dij(70) + Diji71) + Z),y(72) + .5D^.(73). 

'he sum of Dy over counties y is not controlled to a state total. 

An initial estimate, D'^y, of the number of deaths over (0, t] to age 
ohort a in county j\ state i is obtained by applying the national period 
[eath rates by age and race to the corresponding county cohorts in 1970 
nd summing over races: 


D'ay = '^PraiAa- 

r 

rhese initial estimates are then used to apportion the reported county 
leaths into those for the two age groups. Thus the deaths to the young in 
:ounty j, state i over the interval (0, t] are estimated by 

deay(0, r, i,j) = Dy--—t^jrr- 

U yij T iJ gy 

vhere a takes on the values j (young) and e (elderly). 


I.4d Non-Group Quarters Migration of the Young 

rhis component is estimated essentially as at the state level (see section 
1.2e) with certain differences. For counties the base period school-age and 
foung female migration rates sclmigrat(O) and femigyrat(O) are 
lO-year rates, calculated over the previous intercensal decade. In addi- 
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tion, the school-age and young female migration rates for the has 
and for the current postcensal period are each multiplied by facte 
count for underexposure of the entire cohort to migration. For e 
in calculating the base period migration rate for the school-age po 
(aged 6.25 to 14.24 on April 1, 1970), allowance is made for the 
children aged 6.25 to 7.25 on April 1, 1970, were only exposed b 
tion (on the average) 6.75 years rather than the full 10 years. (Mot 
are given by van der Vate (1978).) Thus the analog for counties of 
in section 2.2e calculates 

femigyrat(0 = sclmigrat(0 + [femigyrat(O) 

— sclmigyrat(0 

where the various rates have been multiplied by the factors for \ 
posure. 


3.4e Adjustment to Totals 

The process of adjusting to totals is the same at the county as at 
level, except that births and deaths are not adjusted separately. ' 
factors and X^, as stated in section 2.2f above, are both set 
unity. 


3.5 SOURCES OF DATA AND ERROR IN CM II 

The discussion in section 2.3 above applies here as well. In additic 
lems with Medicare data and group quarters migration estimates 
more severe at the county level. Some counties (especially in Flc 
not have complete Medicare coverage (see Irwin, 1978). Furthernr 
ferential coverage of the elderly population by Medicare has mor 
cant impact for counties than for states. 

Geographic coding of Medicare records is also problematic, in 
dress codes are derived largely from the names of cities, some i 
spread across county lines. In addition, extensive areas beyond t' 
of a city frequently carry the city name. When such areas exten 
second county, the addresses are apt to be coded to the county co 
the major part of the city. The independent cities in Virginia espec 
affected in this way, so that estimates of the elderly populations o 
joining counties are subject to large error (see Irwin, 1978, pp. 
Another source of error arises when a Medicare enrollee who h 

■filpfl for Vipnpfi+c malfpc art aHHfpcc pliatifTP -f/M* cr»r«5al cppitfl+v niirn 


Ik— ano me Meaicare aaaress is automatically ctianged to agree with 
I social security address. 

rhe data on group quarters populations are provided by state agencies 
olved in the Federal-State Cooperative Program (fscp). The county 
ures are sums of the figures for subcounty areas (see section 4, If for 
)re discussion). 


I STRUCTURE OF ERROR IN CM II 

e error structure in cm ii at the county level roughly parallels that at the 
te level, except that the components of error are larger at the county 
el. (See section 2.4 above for relevant discussion.) 

^ ADMINISTRATIVE RECORDS METHOD 

stcensal estimation using ar is approximately the same for counties as 
■ states. All components except net migration are estimated just as in 
[ II (see sections 3.4, 3.5, and 3.6 above). 

Data on place of intended residence for resident aliens (immigrants who 
dare their intentions to secure U.S. citizenship) are kept by the U.S. 
[migration and Naturalization Service for states and places with 1970 
pulations of 100,000 or more. Explicit estimates of the number of im- 
grants from abroad are made for areas within a state having fewer than 
0,000 residents in 1970 by the use of the number of persons of foreign 
i:h reported in the 1970 census. Estimates of immigration from abroad 
counties are derived by summing the estimates of immigration to places 
thin the county. 

Estimation of the county-level, young non-group quarters net migration 
sembles that for the state level. Let s and t denote the time references for 
3 base population and the current estimate, respectively. The young 
n-group quarters net migration rate irsrat(s, t; i,j) for county j in 
ite i is calculated as 


, . ins(5, t; i,j) - ouTs(5, f, i,j) 

irsrat(s, f,i,j) = — ^ ■ . V - . 

OUTS(5, f, l,J) -f NONMOV(s, t] l,j) 

lere ins(5, t; i,j) is the number of exemptions on matched individual 
deral income tax returns classified as inmigrants to county./ in state i 
er the period (s, t], such that the tax returns did not have exemptions 
r age or blindness; outs(5, t; /, j) is the number of exemptions on 
atched individual federal income tax returns classified as outmigrants 
Dm county / in state / over the period {s, t], such that the tax returns did 
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not have exemptions for age or blindness; and nonmov(5, t\ i, 
exemptions on matched individual federal income tax returns cla 
nonmovers from county j in state i over the period {s, t], such th; 
returns did not have exemptions for age or blindness. '' 

Exemptions on a matched return are classified as inmigra 
migrants) if the designated county for the address on the return 
dates s and t and the address at date s (date t) is in county j in si 
Galdi, 1978). An exemption on a matched return is classified { 
mover if the address on the return is designated to be the same ir 
in state i for both dates s and t. The number of exemptions ref( 
number at date t. 

To estimate the net migration for the young non-group quarter 
tion, the migration rate irsrat is multiplied by a populat 
MiGBASE defined by 

migbase(s, t; /,./) = NGQPOP(s; i,./) 

+ Y t\ i,j) - deay(j, t; ij) 

+ netmovy(5, t; i,j) + immCj, v, i,j 

with notation 

NGQPOP(5; i,j) young non-group quarters population at < 
county J in state /; 

netmovy(s, t; i,y) net movement of young from military p< 
overseas to resident civilian population in c< 
state i over the period ( 5 , t]', 
t\ i,j) immigration from abroad to county j in sti 
period (s, t]. 


3.8 ADJUSTING administrative RECORDS METHOD 
estimates to totals 

As in the methods discussed above, ar estimates of county popuh 
scaled to sum to the estimate of state population. Changes in tl 
estimates effected by this last scaling are all attributed to the n 
tion component. 


it the county level, residence classification is difficult because a mailing 
ddress is not always sufficient to determine county of residence. A major 
roblem arises when the post office in a city of one county serves residents 
f an adjacent county; thus people report their addresses as being in the 
ity of the post office rather than in that of their residence. This problem 
/pically occurs when a town straddles county boundaries or when adja- 
ent counties have towns with the same name. To ameliorate this prob- 
;m, a special question was placed on the 1972 and 1975 tax forms to ob- 
ain information on state, county, incorporated place, and township of 
esidence. Galdi (1978) has described the use of the data obtained from 
his question. 

The discussion of state-level error for ar in section 2.9 above applies to 
he county level as well. 

.10 RATIO-CORRELATION METHOD 

istimation of county populations using rc differs somewhat from state- 
evel estimation: at the county level the elderly population is not treated 
eparately. Thus at the county level, rc estimates total non-group 
[uarters population rather than non-group quarters young population. 
)therwise, the estimation of non-group quarters population is the same as 
or states except that (1) the kinds of symptomatic data used vary for dif- 
erent states (see Bureau of the Census (1980) for details) and (2) the com- 
)lications involving “area coverage ratios” (discussed toward the end of 
ection 2.5f above) are not introduced at the county level. 

The discussion of rc in sections 2.5, 2.6, and 2.7 above is thus relevant 
lere as well. 


1.11 USE OF SPECIAL CENSUSES 

f a special census was taken for a county within a year of the postcensal 
;stimate date, the special census count replaces the average of the meth- 
)ds’ estimates for that county. Since special censuses usually do not fall 
)recisely on July 1, the counts are typically interpolated backward or ex- 
rapolated forward according to the trend since April 1, 1970. 

Using the results of the special census for succeeding updates is straight- 
orward for the ar method, which estimates population change since the 
ast update. The special census count is reflected in the estimate of 
lASEPOP. Component method II and the ratio-correlation method, how- 
ever, always refer to changes since the last decennial census, which makes 
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it more difficult to use past special censuses in succeeding updates whf 
these methods are used. 

For illustrative purposes, suppose a special census were conducted c 
July 1, 1975, and that the cm ii and rc estimates for this date were 1,21 
and 1,000 lower, respectively, than the special census count. For the 19' 
estimate the special census count would be used. For the 1977 estimate tl 
1975 special census would be reflected in the basepop estimate used 1 
AR, but it would not be reflected in the cm ii and rc estimates. The Ce; 
sus Bureau would make use of the 1975 special census by adding 1,2( 
and 1,000 to the 1977 cm ii and rc estimates. The implicit assumption 
that either (1) the methods are biased (i.e., the assumptions don’t apply 
the county under consideration), (2) the 1970 data are in error, or (3) tl 
1975 data are lagging in indicating population change. According to poi 
1 or 2 the cm ii and rc estimates would be too low throughout the decad 
According to point 3 the cm ii and rc estimates would be too low for 
while but would ultimately catch up to the true level of change. If point 
were relevant and the cm ii and rc estimates did catch up, the Censi 
Bureau would like to stop adding 1,200 and 1,000 to the cm n and i 
estimate. To determine whether the estimates were catching up, the Ce 
sus Bureau would monitor the time series of population changes as esi 
mated by the different methods and look for sharp shifts occurring in i 
or cm II but not in ar. If this were noted, the Census Bureau would st( 
adding in the differences between the special census and the method 
estimates, 1,200 and 1,000, and no further explicit consideration of tl 
post special census would be taken. 


PART 4 SUBCOUNTY ESTIMATES 

The administrative records method (ar) is generally the only method usi 
to make postcensal population estimates for subcounty units. Howevc 
the results of recent special censuses are used when available, in lieu of t' 
AR estimates. When the special census estimates are used, the adjustme 
of the subcounty estimates to sum to county estimates follows a coi 
plicated procedure, sometimes called “rake/float,” to be discussed belo 
A few states provide subcounty estimates of their own. These are seal 
to sum to the county estimates and then averaged with the ar estimatt 

4.1 ADMINISTRATIVE RECORDS METHOD 

Let time T = 0 refer to April 1, 1970, and let T be scaled in years. T 
time references for the ar estimates, s and t, will correspond to the tir 
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references for the base population and the current estimate, respectively. 
The notation introduced below will refer to subcounty unit k in county j of 
state i. 

The resident population estimate AR(t; ij, k) consists of seven elements 
as follows: 

non-group quarters population at time s (ngqpop(5; i,j, k)) 

-f births to residents over the interval {s, t] (bir(s, f, i,j, k)) 

— deaths to residents over the interval (5, t] (dea(5, v, i, j, k)) 

+ net non-group quarters inmigration over the interval {s, t] 
(netmig(5, V, i,j, k)) 

+ immigration from abroad over the interval (s, t] (imm(5, t; i,j, k)) 
+ population in military barracks at time r {mile AR{t", iJ, k)) 

4- members at time t of special populations other than military bar- 
racks residents (ic{t; i, Jy k)). 

It is important to notice that the elderly population is no longer treated 
separately, because Medicare data are not available for measuring change 
below the county level. 

Each of the above elements will now be discussed in turn. 


4.1a Non-Group Quarters Population at Time s: ngqpop(5; i,j, k) 

Let 0Rs(5; iJ, k) denote the final estimate of resident population for date 
s. The notation “ors” is appropriate because this estimate of population 
is used by the Office of Revenue Sharing. Then ngqpop is calculated by 

NGQPOp(5; i,j, k) = 0Rs(5; i,j, k) — milbar(5; iJ, k) — ic(s; i,j, k). 


4.1b Births Over (s, t]: bir(5, t; i,j, k) 

Neither nchs nor state vital statistics offices compile data on resident 
births for most of the places of population under 10,000 (more than half of 
the subcounty units). 

In estimating bir(5, f, i,j, k) the following procedure is used to allocate 
reported county births to all the subcounty units for which reports of 
births are questionable or not available. 

First, the area-specific, age-adjusted fertility rate for the census year 
1970 is established, according to the distribution of the population under 
1 year old on April 1, 1970. The proportion of population aged under 1 
year in county {i,j) living in subcounty area (/, j, k) is calculated accord- 
ing to 
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pul(0; i,j, k) 


ul(0; i, j, k) 
X;ul(0;/,y, k) 

k 


where ul(0; i,j, k) is the population aged under 1 on April 1, 1970, in 
area (/, j, k). 

The number of births in the year ending April 1 , 1970, is then estimated 
by 


b(0; i,j, k) = pul(0; ij, k) X b(0; i,j), 

where b(0; i,j) is the number of births to residents of county {i,j) during 
the calendar year 1970. 

The fertility rate for women 15 to 39 years old in area {i,j, k) on April 1, 
1970, is calculated by 


fr(0; iJ, k) 


b(Q; i,j, k) 
f1539(0; iJ, k) 


where f1539(0; iJ, k) is the number of women aged 15 to 39 on April 1, 

1970, residing in non-group quarters. 

This census year fertility rate is then applied to estimate births during 
the following year: 

b(1; i,j, k) = fr(0; i,j, k) X f1539(1; /,./, k), 

where f1539(1; i,j, k) is the number of women aged 15 to 39 on April 1, 

1971, residing in non-group quarters. 

To estimate f1539(1; i,j, k), the following procedure is used. A net 
migration rate for the young population for the year (O, 1] is estimated 
from matching of irs tax returns. Essentially, this rate is calculated 
analogously to irsrat, described in section 3.7 above. Denoting this 
migration migyrat(1; i,j, k), calculate 


f1539(1; i,j, k) = f1539(0; i,j, k) 

-f f1539(0; y, k) X migyrat(1; /, y, k). 


Recursively, for time r=2,3,4, ...,9, calculate 


Fl539(r; i,j, k) =■ Fl539(r - 1; i,j, k) 

+ f1539(T — 1; i,j, k) X migyrat(T; /,y, k), 
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'here migyratCT; i, j, k) refers to migration over the interval 
r - 1, r] For noninteger values of T, f1539(7’; ij, k) is computed by 
near interpolation. 

The annual resident births subsequent to- 1970 are estimated on the 
asis of the female population 15-39, estimated as above. To maintain 
onsistency with the annual birth statistics for the county resident popula- 
ion, however, these estimated resident births for area {i,j, k) are adjusted 
adjb) to sum to the county total 8(7; i,j, k): 


ADJB(r; iJ, k) = b(T; i,j, k) X • 

Lb(7; i,j, k) 

k 

'urther adjustments incorporated in adjb will be discussed below. Note 
hat we have yet to derive b(7; i,j, k) for T > 1. 

To estimate b(2; i,j, k), we make use of adjb(1; i, j, k) to update the 
ertility rate, so 


fr(1; iJ, k) 


adjb(1; i,j, k) 
f1539(1; i,j, k) 


.nd 


b(2; i,j, k) = fr(1; i,j, k) X f1539(2; iJ, k). 
"or integer 7 > I the formulas are 


fr(7; i,j, k) 


adjb(7; i,j, k) 
v\S39{T) iJ, k) 


ind 


b(7; i,j\ k) = fr(7 — 1; i,j, k) X f1539(7; i,j, k). 

On the basis of the distribution of estimated births by place, “tolerance 
ntervals” are constructed (see Cavanaugh, 1977, pp. 33-35). Recall that 

'^Specifically, MiGYRAxfr; i,j, k) is calculated as irsraty(y, T\ i,j, k)/{T — s) where i is 
:he latest time prior to T for which the tax file is available and irsraty(s, T\ i,j, k) is 
calculated the same way as irsrat(j;, T\ i, j, k) in section 4.1e, except that only returns not 
claiming exemptions for old age or blindness are used. (This is the same set of returns used to 
estimate countv rates irsrat(s, T; i, /); see section 3.7 above.) 
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for roughly half of the subcounty units, information on reported births is 
available from nchs or from the state vital statistics offices. These 
reported figures are accepted as estimates of births only if they fall within 
the appropriate tolerance interval. Otherwise, the reported figures are 
replaced by the estimates derived above. These estimates are also used if 
no reported data are available. At this point, the estimates of births for 
subcounty units are adjusted to sum to the estimate of total county births. 


4.1c Deaths Over fs, t]: dea(5, t‘, i, j, k) 

As in the case of resident birth reports, information about deaths is not 
available from nchs or state vital statistics offices. Thus for over half of 
the subcounty units, deaths must be estimated by indirect methods, rather 
than by direct reports of deaths. The procedure to estimate deaths applies 
effectively the same logic that underlies the estimation of births, described 
in section 4.1b above. 

While the estimated female population aged 15-39 composes the basic 
reference for consideration of birth events, age distributions of the 
estimated populations of subcounty areas are the most direct referent in 
estimating deaths. Hence the “young” population (under 65 years old), 
“elderly” (over 65), and deaths occurring to these two broad age groups 
are treated separately in allocating county resident deaths to subcounty 
areas. Racial differences in mortality as well are handled in the subcounty 
estimation by an allocation of white and nonwhite deaths according to the 
racial distributions observed in the 1970 decennial census. After the an- 
nual estimates of young and elderly populations are separated into white 
and nonwhite components according to the 1970 proportionality, the 
allocation of resident deaths in the county among the subcounty areas pro- 
ceeds similarly for both racial categories. Thus the following description 
of the allocation procedure will denote all nomenclature by w for the white 
population, and not repeat the same description for the nonwhite popula- 
tion. 

First, the area-specific, age- race-specific death rates for calendar year 
1970 are established. Members of the Federal-State Cooperative Program 
obtain counts of the total deaths in each county by contacting state vital 
statistics departments. These deaths are allocated to the four age-race 
groups (young and elderly by white and nonwhite) in the county on the 
basis of statewide death rates for the four groups. These death rates are 
estimated from life tables constructed by the National Center for Health 
Statistics. The county-level deaths for each of the four age-race groups are 
then prorated by age-race to each subcounty unit according to the unit’s 
share of the county population. For example, let deayw(0; i, j) be the 



Postcensal Population Estimation Methods of the Census Bureau 1 73 

estimate of deaths to young whites of county j, state i in calendar year 1970 
and let popyw(0; i, j, k) be the number of young white non-barracks 
residents of subcounty area k on April 1, 1970. The number of young 
white deaths in subcounty area k during the calendar year 1970 is 
estimated according to 


DEAYW(0; i,j, k) — DEAYwfO; i,j) X 


POPYw(0; ij, k) 
IIpopyw(0; i,j, k) 

k 


The corresponding estimation for the elderly population is, with corre- 
sponding notation, 


DEAEw(0; i,j, k) = DEAEw(0; i,j) X 


POPEw(0; i,j, k) 
Epopew(0; i,j, k) 

k 


The death rates in 1970 are then calculated as 


and 


deaywrat(0; i , j , k ) 


DEAYW(0; i,j, k) 
POPYw(0; i,J, k) 


deaewrat(0; iJ , k ) 


deaew(0; i,j, k) 
popew(0; i,j, k) 


These death rates are applied to the respective estimates of population 
for the subsequent year, 1971, for an estimate of resident deaths in that 
year. The annual estimates of resident deaths in subcounty areas are in 
turn adjusted to the county total. Again recursively, the adjusted area 
deaths are used to compute the area death rate, to be used for the estimate 
of resident deaths in the succeeding year. The procedure is thus quite 
similar to that for births. The annual estimates of the population by age- 
race for each subcounty area will now be described in some detail. 

Since the component of population change by death is considered for 
the non-group quarters population only, the annual estimates of popula- 
tion used to multiply the death rates must be diminished by estimates of 
the group quarters population and some part of the net movement from 
non-group to group quarters population over the year. In practice, the 
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nby(1; i,j, k) = popy(0; ij, k) — milbar(0; i,j, k) 

+ {popy(0; i,j, k) — milbar(0; i,j, k)} X migyrat(1; i,j, k), 

where milbar refers to the barracks population (assumed all young) and 
MiGYEAT is the migration rate for young persons (described in section 4.1b 
above). On the basis of the racial composition observed in the 1970 cen- 
sus, the estimate nby(1; /, k) is partitioned into estimates of the white 
and nonwhite subpopulations. These race estimates are then multiplied by 
the death rates computed earlier, yielding estimates of deaths by race to 
the young population. 

The procedure for the elderly is analogous. The migration rate for the 
elderly, migerat, is calculated in a manner similar to migyrat, except 
that only tax returns with age or blindness exemptions are used. The 
elderly population at time T — Ws initially (before deaths) estimated by 

pope(0; i,j, k) -f pope(0; i,j, k) X migerat(1; i,j, k). 

Then this estimate is partitioned into estimates by race (according to the 
racial composition observed in the 1970 census), and the death rates 
discussed above are applied to the respective initial estimates of the elderly 
population by race. This yields estimates of deaths by race to the elderly 
population. 

For every subcounty unit in a county the estimates of deaths for each of 
the four age-race groups are separately scaled to sum to a county control. 
The scaled components are then added to yield an “adjusted” estimate of 
deaths over (0, 1] for the subcounty unit. 

Tolerance intervals are constructed and used for deaths as for birth esti- 
mates. 

To develop estimates of deaths for times later than T = 1, the pro- 
cedure described above is applied recursively’^ in the manner outlined in 
section 4.1b for recursive estimation of births. 


4. Id Net Migration Over (s, t]: netmig(j, V , i,j, k) 

The non-group quarters migration rate for subcounty units, irsrat(s, t\ i, 
j, k), is calculated analogously to irsrat(5, t; i,j) as described for coun- 
ties in section 3.7 above: 

'^For example, in deriving an initial estimate of the young or elderly populations for T = 2 
(see two preceding displays for T = 1), allowance is made not only for migration but also for 
deaths over the interval (0, 1]. 
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IRSRAT(i', V, i,j, k) 


INS(5, t\ ij, k) — OUTS(3, t\ ij, k) 
OUTS(s, f, i,j, k) + NONMOV(j:, f, i,j\ k) 


rhere ins, outs, nonmov for subcounty unit k are, with one difference, 
efined analogously to their county-level counterparts. The difference is 
[lat exemptions for age and blindness were excluded from the county 
nalysis but included in the subcounty analysis. Thus at the county level, 
RSRAT refers to the young only, but at the subcounty level it refers to both 
oung and elderly. Thus we have 


iNs(j', V, i,j, k) 


0UTs(s, t; i,j, k) 


^ONMOV(S, t, i,j, k) 


exemptions on matched individual federal in- 
come tax returns classified as inmigrants to sub- 
county unit k in county j, state i over the period 
i.s, t]‘, 

exemptions on matched individual federal in- 
come tax returns classified as outmigrants from 
subcounty unit k in county y, state i over the 
period (s, t]; and 

exemptions on matched individual federal in- 
come tax returns classified as nonmovers from 
subcounty unit k in county./, state i over the 
period (s, t]. 


fhe returns are matched by social security number, and the number of ex- 
emptions refers to the number at date t. 

l.ld(l) The Special Problem of Residence Classification In making 
lubcounty estimates an important procedural element involves the assign- 
nent of geographic locations to the tax returns. It should be noted that all 
jroblems concerning residence classification are greater at the subcounty 
ban at the state or county level. In order to determine the governmental 
init to which the exemptions on a given tax return should be referred, 
;ach tax return must be assigned a geographic code identifying the state, 
:ounty, minor civil division if any, and city, borough, or village. Assign- 
nent of geographic codes is difficult because they cannot be accurately 
letermined solely on the basis of mailing address (state and post office 
lames) given by the filer of the tax return. For one thing, many subcounty 
governmental units do not have a post office. Moreover, the postal 
delivery area of a subcounty governmental unit that has a post office does 
not in general coincide with the unit’s geographic boundaries. Finally, the 
mailing address and place of residence of the filer can differ. 
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In an effort to assist the Census Bureau in assigning geographic codes, 
the Internal Revenue Service asked a residence question on the 1972 and 
1975 tax returns. Complete responses to the questions were received for 
over 70 percent of the returns in 1972 and for over 95 percent of the 
returns in 1975.*'’ The relevant portion of the 1975 income tax form is 
reproduced as Figure A-1. While the information from the residence ques- 
tions allowed assignment of geographic codes to the tax returns for 1972 
and 1975, assignment of geographic codes is necessary for every year. 
Geographic codes also need to be assigned to those returns with in- 
complete responses to the residence question. In order to work with 
limited information, the Census Bureau adopted the following imputation 
procedures (see Galdi (1978) for more details). 

For each year for which the residence question was asked, a geographic 
“coding guide” was created. These guides relate the responses to the 
residence questions with the mailing addresses. In particular, each 
residence response is assigned a geographic code. Each mailing address is 
also coded to an address “key” identifying the state, zip code, first seven 
letters of post office name, and address type (numeric, rural, post office 
box, or other). For each key, the distribution of geographic codes corre- 
sponding to residence responses is observed. For example, suppose that 
for a given key, the residence responses on 1975 tax returns containing 
mailing addresses corresponding to that key were distributed as 84.12 per- 
cent inside the limits of city AT in county T, and 15.88 percent in county Y 
but outside the city limits. For each key, the observed distribution is used 
to assign “probability codes” to mailing addresses corresponding to that 
key. In other words, the probability codes are geographic codes that are 
randomly assigned to address keys, where the probability that any par- 
ticular geographic code is assigned to a given key equals the observed pro- 
portion of geographic codes appearing on tax returns with that key. For 
the key in the above example the geographic code for city X in county Y 
would be assigned the probability .8412, and the geographic code for the 
“balance of county” and for county Y would be assigned the probability 
.1588. 

Probability codes are used as surrogate geographic codes when the lat- 
ter are not available. For tax returns in years other than 1972 or 1975 the 
probability codes are assigned according to the observed distribution for 
the most recent year for which the coding guide is available (currently 
1975). 

’“'The 1972 tax forms contained the residence question on the second page, while the ques- 
tions for 1975 appeared at the top of the first page. The questions were also worded differ- 
ently. 


178 


APPEN 


Probability codes are used in classifying matched pairs of tax retu 
inmigrants, outmigrants, or nonmovers for a subcounty unit. Foi 
mating migration over 1976 to 1978 (using 1975 and 1977 tax return 
procedure was as follows: 

1 . The mailing addresses on the pair of matched returns are comj 
If the address keys are the same or if other parts of the mailing add 
match, the persons represented by exemptions on the returns are ( 
fied as nonmovers. 

2. If the mailing addresses do not match, the persons represeni 
exemptions on the returns are classified as inmigrants to the subc 
unit by using the geographic code (or probability code) for the later < 
tax return and as outmigrants from the subcounty unit using thi 
graphic code (or probability code^^) for the earlier year (1975). 

4.1d(2) Use of Tolerance Levels Another difference betweei 
calculation of irsrat at the county level and at the subcounty level 
use of tolerance intervals to stabilize the values of irsrat for certaii 
county units. If a place with fewer than 20,000 people had a coveragi 
(ratio of exemptions on matched individual income tax returns tc 
group quarters base year population) falling outside a tolerance intei 
66 percent to 150 percent of the county coverage rate, irsrat for the 
was set equal to irsrat for one of two larger areas. For estimates p: 
1977, if irsrat for the place was within 10 percent of the county irsi 
was equated to the county irsrat.’^ Otherwise, irsrat for the pla< 
set equal to irsrat for the ensemble of all places under 20,000 popu 
in the county whose coverage rates fell inside the tolerance interva 
procedure now practiced uses the latter “ensemble” rate, unless it ' 
from the county rate by more than 10 percent (of the county ra 
which case the county rate is used. These stabilizations are in 
because, in the case of smaller areas, unusual coverage rates are c 
symptom of geographic coding problems arising from post offic< 
solidations, new incorporations or annexations, places split between 
ties, and distinct places possessing identical names (see Bureau 
Census (1980) or Healy (1978) for further discussion). 

Net migration is estimated as 

netmig(5, t\ i,j, k) = irsrat(5, V, i,j, k) X migbase(j:, t] i,j, 

'^Probability codes were used for the 5 percent of the 1975 returns for which c 
responses to the residence question were not available. Probability codes were also 
4.4 percent of the 1975 returns that were believed to contain reporting or coding er 
'^However, if the difference was within 5 percent of the county irsrat or if the diffe 
net migrants was less than 10, the original irsrat for the subcounty unit was not n 


BASE(s, f, i,j, k) = NGQPOP(5; ij, k) 

+ y2{BiR(5, i,j, k) — dea(5, t', i,j, k) + imm(5, f, i,j, k)} 

NGQPOP(5; i,j, k) is the non-group quarters population at date s in 
:ounty unit k of county j, state i. 

he discussion of error in using ar (see section 2.9 above) is relevant 
; for the subcounty estimates as well as for the state estimates. 


: Immigration From Abroad Over (s, t]: imm(5, t; i,j, k) 

every place whose 1970 population was at least 100,000, data on the 
iber of immigrants from abroad are provided by the Immigration and 
uralization Service. Immigrants from abroad for the balance of the 
e (i.e., the state excluding places of 100,000 or more) are apportioned 
mg places of less than 100,000 according to the number of persons of 
:ign birth counted there in the 1970 census. 


[ Military Barracks and Other Group Quarters Population: 
milbar(5, t\ i,j, k) and ic(s, t\ i,j, k) 

>rmation on special populations is gathered on an annual basis by 
tnbers of the Federal-State Cooperative Program (fscp). The Census 
eau has requested that the fscp members obtain, at a minimum, data 
(1) military barracks with over 100 people and (2) any other special 
(ulation comprising at least 500 persons and at least 2 percent of the 
a’s population. 

rhe extent of data collected varies widely from state to state. Some fscp 
mbers keep track of just points 1 and 2, while others obtain data on 
n the smallest group quarters populations. The group quarters popula- 
is considered are members of the armed forces living in military bar- 
ks, inmates of prisons, inmates of long-term hospitals, and, as a proxy 
college students living in dormitories, college students enrolled in full- 
e programs. (Further details may be found in Bureau of the Census 
80).) 

g Annexations and New Incorporations 

January of each year the Census Bureau conducts the Boundary and 
nexation Survey to determine whether there have been any boundary 
inges or governmental reorganizations (incorporations or disincorpora- 
ns) during the preceding calendar year. The units of government sur- 
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veyed include county governments and the governments of incorporated 
places. From 1971 to 1977 the Census Bureau did not include incor- 
porated places with population under 2,500 in this survey. Beginning 
January 1978, however, all incorporated places were surveyed as well. The 
reason for the increase in the frame is related to the conduct of the 1980 
census. Information about unincorporated places (townships) is provided 
by the underlying counties in which the places (townships) are located. 

The procedures for adjusting the population estimates to reflect bound- 
ary changes will be described, first for areas of at least 5,000 populations^ 
and then for the remaining areas. Areas of at least 5,000 population that 
have undergone boundary changes are identified by the Boundary and 
Annexation Survey. For these areas the Census Bureau performs what is 
called a “separation”: the 1970 population of the annexed or de-annexed 
area is computed from the 1970 census records. Prior to 1977 the Bureau’s 
rule was that the postcensal population estimates would be recomputed to 
reflect boundary changes only if the 1970 population of the annexed (or 
de-annexed) area exceeded 5 percent of the 1970 population of the annex- 
ing area. At present, however, postcensal population estimates for all 
areas of at least 5,000 population are recomputed to reflect any new 
separations, such that** (1) the boundary changes involved new geog- 
raphy, e.g., a place in one township or county was annexed into another 
township (or county), (2) the 1970 population of the annexed (or de- 
annexed) area was at least 100,*^ or (3) a boundary change in a previous 
year had resulted in change of at least 5 percent in the area’s population 
estimate. Currently, a separation is performed for an area of at least 
5,000, provided that the area’s estimate^** of the population of the annexed 
(or de-annexed) area, as reported in the Boundary and Annexation 
Survey, is at least 5 percent of the 1970 population of the annexing area. 

To recompute the population estimate for an area undergoing boundary 
changes, the Census Bureau attributes to the annexed (or de-annexed) 
area the estimated growth rate for the annexing area and then adds (or 
subtracts) the annexed (or de-annexed) area’s population estimate to the 
annexing area’s estimate. 

Prior to 1977, population estimates for areas whose population num- 
bered under 5,000 were not recomputed to reflect boundary changes (ex- 

'^These areas include both those with at least 5,000 population counted in the 1970 census 
and those with postcensal population estimates of at least 5,000. 

'**The rules are not rigid, and the postcensal population estimates are recomputed in other 
cases as well. 

'^In practice, this rule is not strictly applied, and many smaller separations are also taken 
into account. 

^°This estimate usually refers to current population. 
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:pt in unusual cases). The rule is now that estimates are recomputed to 
fleet boundary changes for an area of population under 5,000 if the area 
quests and agrees to pay for the separation. The procedures for recom- 
iting the estimates for areas under 5,000 are the same as those described 
30 ve for use in areas of at least 5,000 population. 

Regardless of the size of the area, updates by the ar method for the area 
. later years do not have to be modified to account for the boundary 
langes, because the additional population is reflected in the estimate of 
IGBASE used to multiply the migration rate. 

,2 ADJUSTMENT OF ESTIMATES AND USE OF SPECIAL CENSUSES 

he AR estimates of subcounty populations are scaled to sum to the county 
)tals. The procedure is analogous to that described for county estimates 
1 section 3.8 above. In a few instances, estimates of subcounty popula- 
ons prepared by the state are also used by the Census Bureau. In such 
ases these estimates are scaled to sum to the county totals and then 
veraged with the ar estimates. 

One final adjustment procedure remains. When a recent special census 
illy of subcounty population is available, it replaces the ar estimate of 
opulation or average of ar and state-prepared estimate, hereafter called 
AR estimate.” To force the total of the subcounty (county) estimates to 
urn to the county (state) totals, the “rake/float” procedure is used: 

1. If the sum of 1970 populations of places in a county receiving a 
pecial census is at least one third of the 1970 county population, the sum 
f the differences between the ar estimates and the special census 
stimates is added (“floated”) to the county total. 

2. If the sum of 1970 populations for places in a county receiving a 
pecial census is less than one third of the county total, but the sum of the 
iifferences between the ar estimates and the estimates from special cen- 
uses exceeds in absolute value 3 percent of the county total, the excess 
iver the 3 percent is added (“floated”) to the county total, and the re- 
nainder [= 3 percent of the county total) is distributed in proportion to 
jstimated population (“raked”) over the areas in the county that did not 
lave a special census. 

3. If neither point 1 nor point 2 applies, then the sum of the differences 
)etween the ar estimates and the estimates from special censuses is dis- 
ributed proportionately (“raked”) over areas within the county that did 
lot have special censuses. 


estimate described in section 3.1 above. For counties the rake, 
cedure is analogous to that just described for places. Changes i 
county or county estimates brought about by raking, floating, a 
to county or state totals are all attributed to the net migra 
ponents. 

4.3 SOURCES AND STRUCTURE OF ERRORS 

Geographic coding is a major source of error in estimating 
migration by the administrative records method. As has been nc 
lems arise because the mailing address on a tax return is often i 
for determining in which unit of local government the filer i 
many cases the residence of the filer is not the same as the n 
dress. For example, in many areas, people living outside the t 
receive mail at post office boxes within the town limits. In a 
number of cases the Census Bureau is unable to assign a maili: 
to a unique subcounty unit because zip codes, street addresses 
office jurisdictions often span geopolitical units. Also, some p 
several counties, and the mailing address will not indicate to wh 
the address belongs, nor will the mailing address indicate whe 
inside or outside the city limits. 

Using information obtained from special questions on res 
pearing on the tax forms for 1972 and 1975, the Census Bi 
structed coding guides, which were used to assign tax returns t 
residence on the basis of reported mailing addresses. Errors ari‘ 
of this coding guide as well. First, there are response errors t( 
tions on residence. The response rate to the question in 1975 v 
cent, but there were also errors in the responses obtained. H( 
discusses errors in the responses to the question, such as a te 
some people living outside town limits to report their residenc 
inside the limits. Other response errors occur in connection wi 
corporations, annexations, boundary changes, places straddlin 
geographic units, or places in different counties possessing 
names. 

A second, more serious source of error is the use of the codii 
assign geographic residence codes to tax returns for other years 
in which the question on residence is asked. If such a year is c 
year when the coding guide was created (i.e. , a year for which tl 
on residence was asked), the chances of error are probably mini 
ever, as the length of time between the year to be coded and tl 
coding guide was created increases, the coding guide will bee 
and more seriously outdated because of boundary changes, ( 
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liling addresses caused by postal reorganization, and population 
DWth. 

The administrative records method rests on the assumption that the 
Itching of tax returns for two separate years on the basis of social secur- 
' number can yield migration rates that are representative of the whole 
ipulation. The data underlying these computed rates obviously do not 
iply to (1) persons who do not file a tax return (or are not claimed as an 
emption) at all or (2) persons (or dependents of persons) who filed a tax 
turn in only one of the two years. There is some question whether the 
igration patterns of these people are similar to those of the population 
ivered by the tax returns (Lowe et al., 1974; Mann, 1978). In addition, 
any persons claimed as exemptions — college students and in some cases 
lildren of divorced parents — do not reside with the person claiming them 
an exemption. 

For areas with population over 5,000, population changes caused by 
)undary changes are not as a rule reflected in the postcensal estimates 
hen these changes are estimated by the annexing area to be less than 4V2 
;rcent of the area’s 1970 population. For a large area this annexed area 
lay contain a large number of people, but if the estimated ratio of the an- 
jxed area’s population to the annexing area’s population is under 4V2 
ercent, no separation will be performed. Population changes resulting 
■om boundary changes to areas with population under 5,000 are not 
jflected in the estimates unless the area requests and pays for a separa- 
on. For those areas undergoing boundary changes but not receiving 
jparations, population changes arising from boundary changes will be 
elected only through the matches of tax returns. In the matching pro- 
ess, however, a person, not a recent migrant into the annexed area, will be 
reated as a nonmover and hence not reflected in the estimate of popula- 
ion change. For a resident of the annexed area who is a recent migrant, 
ctermining geopolitical unit of residence presents severe problems. 
Estimation of births and deaths for places of population under 10,000 is 
navoidably problematic because the tabulations of births and deaths for 
nany of these areas are not available. 

In summary, estimation of all components of population change is more 
lifficult at subcounty than at county or higher levels. The overall extent of 
he errors in ar subcounty estimates is discussed in Part 2 of this report, 
dttle is known, however, about the relative sizes of the errors in the 
istimates of the various components. 


AND CONVENTIONS 


The following list indicates the locations of the definitions of varioi 

tion and conventions used in this appendix: 


Notation 


Secti 

ADJB 

adjusted estimate of births 

4.1b 

AS 

administrative records method 

2.1 

B 

estimated births 

4.1b 

BIS 

births 

2.2a 

CM 11 

component method II 

2.1 

DLAC 

driver’s license address change method 

3.2 

DEA 

deaths 

4.1 

DEAEW 

deaths to “elderly" whites 

4.1c 

DEAEWRAT 

death rate for “elderly” whites 

4.1c 

DEAY 

deaths to “young” 

2.2a 

DEAYW 

deaths to “young” whites 

4.1c 

DEAYWRAT 

death rate for “young” whites 

4.1c 

dra 

national death rate for race r, age a 

3.4c 

ENROL 

number of children enrolled in grades 1-8 

2.2e 

EXSCLPOP 

expected school-age population 

2.2e 

FEMIGYRAT 

migration rate for “young” females 

2.2e 

FR 

fertility rate for females aged 15-39 

4.1b 

f1539 

females aged 15-39 

4.1b 

GQPOPY 

net change of group-quarters “young” 

2.2a 

HUM 

housing unit method 

3.3 

IC 

special populations other than military barracks residents, 



i.e., institutional and college 

4.1 

IMM 

immigrants from foreign countries 

3.7, 

INS 

number of tax exemptions classified as immigrants 

3.7, 

IRSRAT 

migration rate calculated from irs tax returns 

3.7, 

MEDCARE 

number of Medicare enrollees 

2.2b 

MIGBASE 

population base for multiplying a migration rate 

3.7, 

MIGERAT 

migration rate for “elderly” 

4.1c 

MIGYRAT 

migration rate for “young” 

4.1b 

MILBAR 

population living in military barracks 

4.1 

NBY 

non-barracks “young” 

4.1c 

NETMIG 

net immigration of non-group quarters residents 

4.1 

NETMOVY 

net movement of young from military group quarters to 



non-group quarters 

2.2a 

NGQMIGY 

net migration of non-group quarters “young” 

2.2a 

NGQMIGYRAT 

migration rate for the non-group quarters “young” 

2.2e 

NGQPOP 

non-group quarters population 

3.7 

NGQPOPY 

non-group quarters “young” 

2.5d 

NONMOV 

number of tax exemptions classified as nonmovers 

3.7 

OUTS 

number of tax exemptions classified as outmigrants 

3.7 

POPE 

“elderly” population 

2.2a 

POPY 

“young” population 

2.2a 


pul 

proportion of population under 1 year of age 

4.1b 

RC 

ratio-correlation method 

2.1 

RESPOP 

resident population 

2.2a 

SCLBIR 

school-age children born since the last census 

2.2e 

SCLCHT 

cohort of school-age children 

2.2e 

SCLDEA 

deaths to school-age children 

2.2e 

SCLMIGRAT 

migration rate of school-age children 

2.2e 

SCLPOP 

school-age population 

2.2e 

ul 

population under 1 year of age 

4.1b 

{Tl, T2] 

time period since T, up to and including Tj 

2.2a 

Conventions 

elderly 

population aged 65 or over on estimate date 

2.1 

OHS 

population estimate used by Office of Revenue Sharing 

3.1, 4.1a 

preliminary 

second set of county population estimates 

3.1 

provisional 

first set of county population estimates 

3.1 

revised 

third set of county population estimates 

3.1 

young 

population aged less than 65 on estimate date 

2.1 
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Postcensal Per Capita 
Income Estimation 
Methods of the 
Census Bureau: Summary 

DONALD E. PURSELL and 
BRUCE D. SPENCER 


The Census Bureau defines the per capita income of an area as the 
or average total money income of residents during the preceding 
Thus the 1974 per capita income of an area is the mean income 
population on April 1, 1974, during the calendar year 1973. Total 
income is the sum of six components: wage and salary income; noi 
proprietors’ income; farm proprietors’ income; social security and 
retirement income payments; public transfer payments (assi 
payments); and other income, including interest dividends, unei 
ment insurance, etc. 

To estimate postcensal per capita income for states and counti 
Census Bureau makes separate postcensal estimates of each of 
components of total money income, adds them, and then divides tl 
by the estimate of postcensal population.* Postcensal estimates c 
county per capita income are obtained by direct estimation of the 
change in per capita income and the application of this rate to th 
census estimate of per capita income. These methods are describe( 
fully below; see also Bureau of the Census (1980). 


STATE UPDATES 

To estimate postcensal per capita income for states, the Census I 
updates the estimate of total money income and then divides by th 

' As discussed below, county wage and salary income is updated on a per capita ba 
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censal population estimate (see Appendix A). In updating the money in- 
come estimate, updates for each of the six components are made sepa- 
rately and then summed. In this section we consider postcensal estimates 
for 1975 (that is, 1974 per capita income). 

For wage and salary income the Census Bureau uses data from the In- 
ternal Revenue Service (irs). The ratio of wage and salary income for 1974 
to that for 1969 is estimated by the ratio of wage and salary income 
reported to irs for 1974 to that reported for 1969. This ratio is then multi- 
plied by the estimate of 1969 wage and salary income from the 1970 cen- 
sus, yielding the postcensal estimate of wage and salary income. 

Updates for the other five types of income are obtained from the Bureau 
of Economic Analysis’ personal income estimates. The procedure is simi- 
lar to that for the wage and salary updates, but an extra adjustment is 
used because the bea personal income figures are based on the midyear 
(July 1) population for the respective year. Thus personal income for 1974 
refers to income of the 1974 population in 1974, while the Census Bureau’s 
total money income for 1974 refers to income the 1975 population received 
during 1974. 

The ratio of the public assistance component of total money income for 
1974 to that for 1969 is estimated by 


1974 BEA public assistance income 
1974 population 

1969 BEA public assistance income 
1969 population 


X 1975 population 


X 1970 population 


To obtain the postcensal estimate of public assistance income, this ratio is 
multiplied by the estimate of 1969 public assistance income provided by 
the 1970 census. The other components of income — net nonfarm self- 
employment, net farm self-employment, social security and railroad 
retirement, and other income — are estimated analogously to public assis- 
tance income. 

Total money income for 1974 is estimated as the sum of the six income 
components. Each state’s total money income is divided by the postcensal 
estimate of the April 1, 1975, population of the state. This population esti- 
mate is calculated as one-fourth of the July 1 , 1974, postcensal population 
estimate plus three-fourths of the July 1, 1975, estimate. 


COUNTY UPDATES 

For four of the six components of income, postcensal estimates of total 
money income at the county level are obtained as at the state level. Wage 
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ferently. 

The wage and salary updates are done on a per capita basis in order to 
minimize the effect of possible errors in the geographic coding of tax re- 
turns. The ratio of per capita wage and salary income for 1974 to that for 

1969 is estimated by the ratio of the average reported wage and salary in- 
come per exemption on the 1974 ms tax forms to the average reported 
wage and salary income per exemption on the 1969 ms tax forms. Post- 
censal wage and salary income is then estimated as the product of that 
ratio, the estimate of 1969 per capita wage and salary and income pro- 
vided by the 1970 census, and the estimate of the April 1, 1976, popula- 
tion. 

There are two major problems in obtaining estimates of farm income. 
First, county farm income is notoriously volatile, capable of major, sharp, 
year to year changes. These changes may be either understated or over- 
stated by the data used to measure them. Second, the problems of com- 
parability between bea and Census Bureau estimates for farm self- 
employment income are severe. In particular, bea estimates tend to show 
considerably more annual variation than do estimates from censuses and 
surveys. For these reasons the Census Bureau initially prepares two esti- 
mates of postcensal farm self-employment income, a “net” farm income 
estimate and a “gross change” farm income estimate, and then uses those 
estimates to derive a “constrained net estimate” of farm self-employment 
income. The “net” farm income estimate is derived as the sum of (1) the 

1970 census estimate of 1969 farm self-employment income and (2) the 
dollar change in bea farm self-employment income plus land rent. The 
“gross change” farm income estimate is obtained by applying the ratio of 
1974 BEA farm receipts to 1969 bea farm receipts (adjusted to account for 
the July 1 reference base of bea estimates) to the 1970 census estimate of 
1969 farm self-employment income and adding to this the sum of (1) the 
dollar change in bea land rent and (2) the 1970 census estimate of 1969 
land rent.^ The constrained net estimate is then calculated as the median 
of three quantities: net farm self-employment income, 80 percent of the 
gross change estimate of farm self-employment income, and 120 percent 
of the gross change estimate of farm self-employment income. This con- 
strained net estimate is used as the postcensal estimate of farm self-em- 
ployment income. Approximately 25-30 percent of the county estimates 
are directly affected by the constraints, that is, are based on the gross 
change rather than the net estimate. 

The postcensal estimates of the six income components are then added 

^The 1970 census estimate of 1969 land rent is estimated as the farm self-employment in- 
come of nonfarmers. 
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1 money income, the county estimate is divided by the appropriate esti- 
e of population, yielding the postcensal estimate of county per capita 
ime. This procedure was followed for 1972 (initial and revised) and for 
1 (initial) per capita income estimates; however, additional constraints 
j incorporated into the procedure beginning with the 1974 (revised) 
the 1975 (initial) per capita income estimates. 

1 the new procedure, total money income is decomposed into two 
;s, adjusted gross income (agi) and transfer income (ti). The latter, 
is composed of social security income, public assistance income, and 
t of “other” income, such as unemployment and veterans’ payments; 
former, agi, is the rest of total money income. Estimates of agi are 
jrmined by adding the component estimates derived above, using bea 
mates to allocate “other” income between agi and ti. The ratio (for 
4 income) 

^ _ 1974 county per capita agi/ 1969 county per capita agi 
1974 state per capita agi/ 1969 state per capita agi 

; computed, where per capita agi for year 1974 (1969) is the estimated 
for 1974 (1969) divided by the estimate of population for 1975 (1970). 
imilar ratio was computed from income reported on tax forms, 


'4 county irs agi per exemption/ 1969 county irs agi per exemption 
)74 state irs agi per exemption/ 1969 state irs agi per exemption 

jre irs agi per exemption refers to the ratio of the total agi income 
orted on irs individual income tax forms to the number of exemptions 
imed on the tax forms. The constrained estimate of county agi was 
n obtained as the median oiA,B-\- 0.25, and B — 0.25. Total money 
ome is then recomputed by adding the constrained estimate of agi to 
estimate of ti. The estimates of county total money income are scaled 
sum to the state estimate of total money income. Per capita estimates 
calculated by dividing the totals by the respective population esti- 
tes. 


IBCOUNTY UPDATES 

e derivation of subcounty estimates roughly parallels that of the county 
imates. Significant differences do exist, however: 
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1 . The per capita estimates are updated directly rather than by sepa- 
rate updating of total money income and population. 

2. The BEA estimates are not available for subcounty units— for income 
components not measurable from irs records, county per capita estimates 
are applied to all subcounty units. 

3. Many constraints are employed to damp changes in the estimates. 

4. The 1970 census estimates were modified to reduce sampling varia- 
bility and to account for boundary changes and annexations. 

5. Changes in subcounty per capita income are estimated in multiple 
increments rather than single increments. Thus the change from 1969 to 
1974 is estimated by the sum of the changes from 1969 to 1972 and 1972 to 
1974; the procedure is similar to the administrative records method used 
in population updates (see Appendix A). 

6. There is a complicated adjustment of subcounty estimates to county 
totals. 

For 1972 per capita income updates, the observed rate of change in irs 
AG i per exemption (as defined earlier) and the observed rate of change in 
BEA per capita county transfer income were applied to 1970 census esti- 
mates of these components for 1969. The procedures for 1974 (1975) up- 
dates were similar except that the rates of change referred to the period 
1972-1974 (1974-1975) and the base estimates were for 1972 (1974) rather 
than for 1969. For simplicity, only the 1972 updating procedure is de- 
scribed here; the other procedures are analogous. (For more details, see 
Herriot (1978) and Bureau of the Census (1980).) 

The first stage of the updating procedure consists of four operations to 
the 1969 base per capita income figures. 

1. The 1969 per capita income figures for areas experiencing annexa- 
tions and boundary changes from 1969 to 1972 were modified to adjust for 
any resultant changes in per capita income. 

2. To reduce the effect of sampling variance in the 1970 census esti- 
mates for places of population under 1,000, a weighted average of the 
1970 census estimates and regression estimates was used. The weights 
were derived by applying James-Stein techniques; see Fay and Herriot 
(1979). 

3. The 1969 income estimates were decomposed into transfer income 
and adjusted gross income so that the irs data on adjusted gross income 
could be used. 

4. After the above three procedures were carried out, the sum of the 
estimates for subunits of geography might have differed substantially 
from “independent” estimates of the total. An iterative adjustment proce- 
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e was used to simultaneously control subcounty estimates to the county 
il and force the sum of the estimates for each of several size classes of 
:es to add to the statewide totals for the size classes. 

'he second stage of the updating procedure consisted of estimating and 
ilying the rates of change in per capita income. 

. The IRS data were adjusted for annexations and boundary changes. 

. To protect against severe errors in data, a host of edits and con- 
lints (at least 11) were imposed. Those constraints had the effect of 
xicting estimates of rates of change for geographical subunits to be 
ie to the corresponding countywide or statewide average rate of change. 

. After the edits and constraints were imposed, the rates of change in 
AGi per exemption and in bea per capita countywide transfer income 
■e applied to the respective estimates for the base period. 

•. After those updates of per capita adjusted gross income and transfer 
ome were made, they were separately forced to sum to estimates of the 
il. The procedures are similar to those used for controlling the esti- 
tes for the base period. 

^fter controlling to totals, the final per capita income estimate is ob- 
led by dividing the sum of per capita adjusted gross income (agi) and 
capita transfer income (ti). 
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The Role of 
Judgment in 
Postcensal Estimation 

BRUCE D. SPENCER 


Judgment underlies every application of statistical theory and methodol- 
ogy. A statistical procedure may be justified by well-defined assumptions, 
but the applicability of those assumptions in any given situation is deter- 
mined and decided by judgment. In demographic estimation, judgment is 
especially pervasive. The purpose of this paper is to examine how judg- 
ment enters into the formulation and use of the postcensal estimation 
methodology. 

One can distinguish between a stated protocol for assembling and 
analyzing classes of individual units of information and a less rigidly 
stated, more flexible approach. The former may be termed less subjective, 
i.e., less dependent on judgment. There are degrees of subjectivity. For 
example, one procedure might treat all units of information in precisely 
the same manner but with provision for exceptional treatment of units 
possessing specified unusual characteristics. Another procedure might 
rigidly specify the process for every stage of analysis of individual units ex- 
cept the final one, which may depend on judgment. Yet another proce- 
dure may involve interpretation of a body of data that rests almost entirely 
on the judgments of one or more experts. Perhaps one key to distinguish- 
ing fixed methods (objective techniques) from subjective ones (judgment) 
is the extent to which a given procedure is reproducible by a second person 
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mate should be rejected in a particular instance, and (4) in modifying 
methods. 

1. The role of judgment in formulating methodology is indispensible 
not only for demography but for all scientific endeavors. Bayesian decision 
theory is the most explicit in its use of subjective notions, but all statistical 
analysis has some degree of subjectivity, e.g., definition, choice of models, 
mechanics of estimation, selection of analytic techniques to guide infer- 
ences, presentation of evidence and conclusion. 

2. Demographic methods draw upon data to produce estimates of pa- 
rameters such as resident population or annual per capita income. Of 
course, data contain errors, and severe errors can cause serious inaccura- 
cies in the estimates. Thus demographers at the Census Bureau screen the 
input data for possible large errors or outliers. If a piece of data coming in 
does not seem consistent with past trends or other current data, it is 
flagged. The decision to use a particular editing protocol or outlier-detec- 
tion technique is usually a matter of judgment, although the use of a 
specified editing routine is most often governed by objective rules. 

Objective rules are illustrated by the Census Bureau’s use of tolerance 
intervals for estimation of births, deaths, and migration rates for sub- 
county units (see Appendix A, sections 4. lb-4. Id). If the data fail to meet 
explicit criteria, the data are rejected. For example, if too few persons in 
an area file tax returns, the migration information for the area is rejected. 
These rules are applied automatically, by computer. 

The screening for outliers of school enrollment data used in component 
method II affords a good example of the use of judgment; cm ii uses the 
calculated migration rate for school-age children in grades 1-8 as the pri- 
mary basis for making inferences about the migration rate for the popula- 
tion as a whole (see Appendix A, section 2.2). For illustrative purposes, 
consider estimating the migration rate for these children 1 year after the 
last census. Suppose (for simplicity) that all school-age children are ac- 
tually enrolled in school and suppose also that no deaths have occurred to 
these children during the year. Then the migration rate for the school-age 
population is estimated by 

ENROLi — EXPECi 
ENROLq 

where enroli is the number of school-age children reported enrolled in 
grades 1-8 for the current year; enrolq is the number in grades 1-8 for 
the preceding (census) year; and expecj is the number of children who 
would have been enrolled in grades 1-8 had there been no migration. The 
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demographers at the Census Bureau calculate enrolq and expecj on the 
basis of census data but must use current data to estimate enroli . To 
determine whether these current data are reliable, the demographers rely 
on their judgment. They will often consider several comparisons, such as 
comparing enrolj — enrolq with a historical time series of yearly dif- 
ferences in enrollment or comparing the difference between expeCj — 
EXPECo and enroli — enrolq with a historical time series of these differ- 
ences. If the demographers perceive an abrupt change from a historical 
trend, they will flag the current enrollment data, enrolj , as an outlier. 
The data checks are not performed according to fixed rules, and thus they 
may not be replicable. 

Although the use of judgment can be more complicated than the use of 
formal rules, some judgmental tests can be formalized. Since a formal test 
can be performed more quickly and more accurately, usually by compu- 
ter, a greater variety of tests can be done. Precise specification of proce- 
dures (a prerequisite for objective tests) makes them amenable to statisti- 
cal analysis, so that statistical properties, such as confidence limits, can 
be discovered. Knowledge of these properties could permit rankings of 
priority for data to be screened, so that if large amounts of data are 
suspect, the worst cases can be selected for early screening. Statistical 
analysis can also lead to the introduction of sophisticated improvements. 

A recommended practice is to search for and use methods that tend 
toward objectivity when it can be approached, and to use subjective 
methods (judgment) only when the state of the art fails to provide satisfac- 
tory objective rules and techniques. The increased role of computers in 
data screening should supplement rather than replace the use of judg- 
ment. Freed from the necessity of making routine calculations, analysts 
can spend more time interpreting the results of the calculations and devis- 
ing new kinds of tests. 

Three things can happen to a piece of data flagged as an outlier: (1) an 
attempt is made to verify the data, (2) the piece of data is rejected and 
replaced by a substitute value, or (3) the datum is “trimmed.” If the 
datum is verified, it is accepted; if verification attempts show the datum to 
be invalid, practice 2 is used. An example of practice 2 is the estimation of 
migration rates for subcounty units with fewer than 20,000 inhabitants 
(see Appendix A, section 4. Id). If the data for such a place do not satisfy 
certain formal criteria, the migration rate is set equal to either the county 
migration rate or the migration rate for all places in the county whose data 
did satisfy the criteria. The estimation of postcensal per capita income 
provides many illustrations of “trimming” (see Appendix B). For exam- 
ple, county farm self-employment income is estimated by “net farm self- 
employment income,” provided the latter falls between the tolerance 
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limits of 80-120 percent of “gross change farm self-employment income.’’ 
If the net farm self-employment income falls outside these limits, it is 
trimmed to the nearest limit and then used as the estimate of farm self- 
employment income. 

Current practice at the Census Bureau treats data flagged by computer 
by practice 2 or 3 above. Data flagged by judgment are treated by practice 
1. This method is practical but not ideal. The number of separate data 
that are flagged by computer is so large that verification is not feasible for 
each. But the widespread use of practice 2 or 3 forces the data to reflect an 
often unreal stability.* Moreover, the tests now used to screen data for 
outliers search for data deviating from past trends. If in fact the underly- 
ing parameter changes but the data do not reflect this change, the data 
will not be flagged. 

There is a need for more verification of data flagged by computer, in- 
stead of mere editing of the data. To verify all the flagged data would be 
prohibitively expensive; what is needed is a way to identify the data whose 
verification should receive priority. One useful approach would be to 
design and implement objective criteria to identify such data and assign 
them priority rankings. The decisions regarding which data should be 
verified could either be made objectively, solely on the basis of the as- 
signed rankings, or subjectively, partly according to the rankings and 
partly according to other considerations. 

3. Judgment is used to decide when a method or estimate should be re- 
jected in a particular instance. A good example is the decision of when to 
stop incorporating information from a past special census into the esti- 
mates of population provided by cm ii or the rc method (see section 3.11 
of Appendix A). 

4. The Census Bureau continually revises its methodology in minor and 
not-so-minor ways. The decisions to make the revisions are made on the 
basis of statistical evaluations, demographic logic (see Bureau of the Cen- 
sus, 1974), or professional judgment. These approaches are described and 
compared below. 

In a statistical evaluation the estimates provided by a given method are 
compared to “benchmarks” (typically, estimates of high accuracy) such as 
decennial or special census counts. Quantitative measures of accuracy can 
then be computed to serve as a basis for evaluating the merits of the 
method. The benefits of this approach are objectivity and quantification. 
Properly performed, a statistical evaluation is not affected by the beliefs of 

' Practice 2 typically assigns a large-area (county) rate of change to a component area (sub- 
county), and practice 3 shrinks the rate of change toward zero. 
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those performing the evaluation. Furthermore, statistical measures 
as bias and standard deviation can be estimated. The major drawba< 
using statistical evaluations is the relative lack of representative be 
marks. Decennial census counts are of course available only ever 
years; a method that tested well for the 1960s may be poor for the 
Special censuses are carried out only for a small proportion of areas, 
the areas receiving them constitute a nonrandom sample (see section 
of the report). An alternative consists of using low-precision bench m< 
which may be more readily available than high-precision benchmarks 
Appendix H). The regression-sample method is another effective £ 
native for counties (see section 3.2 of the report and Ericksen (1974 
Use of logic involves replacing the assumptions underlying a methc 
more plausible assumptions. Whereas demographic logic considers tb 
ternal consistency and reasonableness of the methods, professional j 
ment focuses on the plausibility of the output of the methods. Such < 
cise of professional judgment is similar to statistical evaluations e? 
that benchmarks are replaced by subjective estimates. The latte 
course, need not be based on introspection but can draw upon obsi 
tions and current data not used by the method under consideration 
example, the Census Bureau’s decision to drop births as a predictor 
able in the ratio-correlation estimates of state populations in the 1 
was based on judgment (Bureau of the Census, 1974, p. 11): 

For the 1960-1970 period and the 1950-1960 period as well, births had been ( 
the strongest indicators of population. . . . However, some States [in the 
1970’ s] were in the process of removing restrictions on abortions in advance 
1973 Supreme Court ruling. In these States, the decline in the number of 1 
between 1970 and 1972 was much sharper than for the remainder of the Natic 
a result, the ratio-correlation estimate gave unrealistically low population esti 
for these States. This was most apparent in the two largest States, Californi 
New York. 

The Census Bureau’s decision not to use the composite metho( 
estimating county populations in the 1970s was based on similar rej 
ing. Note that both the composite and ratio-correlation methods i 
births performed very well in the Census Bureau’s tests of methods 
Bureau of the Census, 1973, 1974). If something is awry in the meth 
data, exercise of judgment may be the only recourse. The danger is 
should the estimation method be accurately indicating unexpected to 
judgment may obscure perception of these. Where possible, stati 
tests should be used to supplement or supplant this professional . 
ment. For example, the regression-sample method could have been 
to justify the decisions just described. Use of error models (as discuss 
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Appendix G) can improve the effectiveness of such subjective methods. 
Judgment is the means by which challenges to the Census Bureau’s 
estimates are resolved (not necessarily presented). When decisions must 
be made for individual cases that may vary widely in circumstance, judg- 
ment may be the only reasonable method. 
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As was discussed in section 1.2c of the report, the Census Bureau se: 
population estimates to the areas for their review before the estimai 
published. The Census Bureau also sends a review guide, which is 
duced below. 
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REVIEW GUIDE FOR LOCAL POPULATION ESTIMATES 


ESTIMATING METHOD 

The population estimate shown in the enclosed notice was developed by use of a component procedure in 
I which each of the components of population change (births, deaths, net migration, and special populations) 
were estimated separately. The estimates were derived in tour stages, moving from the 1970 census as the 
base year to develop estimates for 1973, and in turn, moving from 1973 as the base year to derive estimates 
j for 1975, from 1975 as the base year for 1976, and from 1976 as the base for 1977. 

Natural change — Reported resident birth and death statistics were used, where available. These data 
I were collected from State health departments and supplemented, where necessary, by data prepared and 
J published by the U.S. Department of Health, Education, and Welfare, National Center for Health Statistics. 

For subcounty areas where reported birth and death statistics were not available from either source, esti- 
j mates were developed by applying fertility and mortality rates. 

] 

'• Migration - Individual Federal income tax returns were used to measure migration by matching individual 

’ returns for successive periods. The places of residence on tax returns filed in the base year and in the 

I estimate year were noted for matched returns to determine inmigrants, outmigrants, and nonmigrants 
' for each area. A net migration rate was derived for each locality, based on the difference between the 

1 inmigration and outmigration of taxpayers and dependents, and was applied to a base population to yield 

an estimate of net migration for all persons in the area. Immigrants from abroad are added based upon 
I data from the U.S. Immigration and Naturalization Service. 7 

* Adjustment (or special populations — In addition to the above components of population change, estimates 
I of special populations were also taken into account. Special populations include persons who are residents 

1 of an institution, college, or military barracks. Data for these groups are collected from the specific 

i institutions involved. 

) 

I Other adjustments — In seven States (California, Florida, Oregon, New Jersey, Vermont, Washington, and 

( Wisconsin) the subcounty estimates developed by this method were averaged together with estimates 

i prepared by an agency in each State responsible for producing local population estimates. Special censuses 

were used in place of an estimate for localities where special censuses were taken close to the estimate date. 

1 The census results were adjusted to represent the population on July 1, 1977. Places which have had 

j boundary changes since January 1, 1970 and before December 31, 1977, may have their 1970 census 

count and 1977 population estimate adjusted to reflect the population living in the annexed areas at the 
time of the 1970 census. There is a cost involved for the determination of the 1970 population in annexed 
areas, however, and an estimate of this cost can be provided by telephone. In places where this deter- 
' mination has already been made, the enclosed estimate reflects the adjustment. 

! COUNTIES 

I Estimates of the population of counties, independent cities, cities whose boundaries are coextensive with 
a county, and cities made up of more than one county, were developed by a technique that differs from 
j that used for the subcounty places. The reason for this is the availability of more types of data sources 

I at the county level enabling the derivation of estimates by more than one method. 

The first technique is Component Method II. This procedure uses school enrollments to estimate the 

I migration of persons under the age of 65. Births and deaths are tallied from reported county resident 

i births and deaths to the population under age 65. The county population over age 65 is estimated based 

on the change in the number of Medicare enrollees. These two estimates by age are then added together 
j to produce an estimate for the total county population. 

\ 


iThis number refers to legal immigrants only, since illegal immigrants cannot be enumerated or estimated 
accurately from this or other data sources. 
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COUNTIES - Continued 

The second technique is the Administrative Records method described earlier for subcounty places. The 
only variation at the county level is that the technique is specific for the population under age 65. The 
population over 65 is produced by the same method used in Component Method II. These separate esti- 
mates are combined to generate the total county figure. 

The average population change between 1976 and 1977 for these two methods is added to the 1976 county 
estimate published in Current Population Reports, Series P-26. In approximately 15 States, addition^ 
data are available to permit the use of a third estimating technique that relies upon regression procedures 
to link shifts in local population with changes in related factors that are symptomatic of population change. 
A full discussion of all three methods can be found in the series P-26 reports and in series P-25, No. 640. 

APPEALS AND CHALLENGE CONSIDERATIONS 

The resulting figures for counties and local areas are an estimate of the population, not an actual count. 
A census of the entire U.S. population has not been conducted since 1970. Nonetheless, many public 
programs and planning activities require more up-to-date information. 

Due to the nature of estimates, however, some error is always experienced in any technique used. The 
estimates produced by the Census Bureau for ali leveis of governments have undergone extensive testing 
and evaluation, with the results indicating acceptable error levels.^ Locally prepared estimates also must be 
based upon thoroughly tested and recognized procedures. Even after thorough evaluation of the figures, 
challenges to the estimates shouid only be made when the differences between the Census Bureau estimate 
and locally derived figures are substantial enough to take into account expected estimation error levels. 

Locally derived alternative estimates that are sent as a challenge should be accompanied by complete 
documentation describing in detail the derivation of the figure and the sources of the data used. For 
example, localities frequently rely upon the housing unit method for an alternative estimate. If a housing 
unit method estimate is sent, the following problems must be accounted for specifically in the documen- 
tation provided: 

1. The building permits must be specific to your incorporated limits only. 

2. Annual time series of residential permits and demolitions from 1970 to the 1977 estimate date 
must be supplied. 

3. Estimates based upon units denoted by type (i.e., single-family and multi-family) are preferred, 
if data permit. 

4. The data must reiate to a July 1977, population estimate date only. Therefore, a time lag before 
the July 1977 date must be used to allow for the time between the issuance of permits and the 
completion of the units. A lag of 3 to 6 months is appropriate, depending upon local conditions. 

5. Permits for commercial and home improvement projects should be removed from the data to 
reflect only residential units. 

6. Demolitions and conversions to commercial uses should be registered and removed from the 
housing stock. 

7. Vacancies must be accounted for. 

8. Between 1970 and 1977, the U.S. average household size declined by 9 percent as a result of 
fewer births and an increase in one- and two-person households. Although the change will vary 
depending upon the size and type of community involved, any estimate based upon a housing 
unit method must take into account a household size change factor, 

9. Nonhousehold populations that are living in group quarters should be accounted for separately. 
That is, they should be removed from the population totals in the initial year and replaced at 
the estimate date. These group quarters populations must consist of only those institutions 
having long-term housing facilities (e.g., college dormitory populations, inmates of Federal 
or State prisons). 


2 An evaluation of the methods is pubiished in Current Population Reports. Series P-26, No. 21, and Series 
P-25, Numbers 740 to 789. A more detaiied evaluation is forthcoming in Current Population Reports, 
Series P-25, No. 699. 
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Although accuracy of the vacancy and population per household factors is critical in the housing unit 
method, the inventory of housing should not be overlooked as a potential source of error. 

If utility data are used to estimate the number of occupied residential units instead of building permits, 
all of the above considerations must be accounted for, except for vacancies and the time lag needed for 
building permits. In addition, treatment of the following problems must be documented: 

1. The coverage of the population by the utility must be evaluated against the 1970 household 
count, i.e., the number of housing units serviced by the utility in 1970 should be in general 
agreement with the number of occupied housing units enumerated in the 1970 census. 

2. Master meters should be accounted for, and conversions from master meters to individual meters 
must be checked. 

In order to obtain more accurate current information concerning the vacancy rates and population per 
household factors specific to local areas, some communities have conducted sample surveys. However, in 
such cases, it will be necessary to accompany the results with documentation specifying the sample design, 
the derivation of the sampling frame used, the assumed confidence limits and how they were developed^ 
and an estimate of sample bias. Final computations should include a measure of the standard error. Other 
areas that are considering undertaking survey work as a part of their appeal should be in contact with us 
before initiating the project. 

If any test of the estimating methodology has been made, the results should accompany the other challenge 
materials. This could consist of a comparison between the 1970 census count and an estimate of 1970, 
using the I960 census as a starting point and your particular estimating method as the technique used to’ 
derive the 1970 estimate. This would enable us to better assure the accuracy of your particular technique. 

Population projections are not suitable as challenge information since they do not reflect current data 
trends, but rather attempt to predict future change. Frequently projections are based on past growth 
patterns or on a series of assumptions concerning population change factors. Estimates, on the other hand, 
use cunent data series that are symptomatic of present population changes. Casual personal observations, 
“informed” opinions, and similar undocumented information cannot be used as a basis for an appeal. 

J Caution should also be taken that the population estimate conforms to the same definition of usual place 
of residence as is used in the decennial census. That is, temporary residents who live most of the year 
elsewhere should be excluded. 


THE CHALLENGE PROCESS 

i 

: Once a challenge is received by us, it goes through a detailed review. This process includes examination of 

: the data series provided by the challenging locality together with a second detailed review of the data 

I used in our estimating procedure. If it is impossible to resolve differences in the results based on the data 

j series provided, any additional information available to us will be consulted and you may be contacted 

i for further clarification and help. 

] If deficiencies are found in the information used by us in preparing the original population estimate, and 

! if the data supplied by the challenging locality substantiate a different population figure, a change will 

: be made, The revised estimate will be provided to the Office of Revenue Sharing (ORS) and other Federal 

; agencies which use the population estimates In the distribution of Federal grant-in-aid funds, and you 

will be notified of the change. If a challenge is unsuccessful, it is often due to insufficient data for the 
challenge, the data supplied support the original population estimate, or the challenge materials are based 
on personal observation rather than firm support data and estimating techniques. 
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THE CHALLENGE PROCESS - Continued 

If a challenge is unable to be resolved through the informal procedures described above, a State or ut 
local government may request a formal hearing. Details for formal hearings are contained in regula 
to be printed in the Federal Register during March 1979. The major provisions (1) stipulate thi 
informal challenge be filed no more than 180 days after release of the estimates, (2) require a locali 
complete an informal review jointly with the Census Bureau before a formal hearing is allowed, (3) sp 
the appointment of a hearing officer to receive both written and oral evidence under oath, (4) allow fc 
cross-examination of both parties in the proceedings and of all witnesses, if requested, and (5) require 
all action on challenges be completed within one year of the date of release. 

In past years, a further and final resolution could be made by conducting a Federal special census, 
ever, since field preparations are already underway for the 1980 national census, we are unable to cor 
for special censuses. 

Please accept our thanks in advance for your cooperation and assistance in this review procedure, 
please note that the population figure you will receive shortly from ORS may not reflect revisions 
as a result of this challenge process. This is due merely to the timing of the ORS preliminary alloc 
work, and will be. corrected before the final distributions through general revenue sharing and 
Federal programs. 
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Data Errors 
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DETERMINATION OF GENERAL REVENUE 
SHARING ALLOCATIONS 

General revenue sharing (grs) allocations are determined according to 
data-based formulas. Application of the formulas is complicated and is 
performed by computer. The essentials of the procedure are described 
below; more complete discussions, in order of increasing detail, are found 
in the work of Nathan et al. (1975), U.S. Congress, Joint Committee on 
Internal Revenue Taxation (1973), Spencer (1980), and Bowditch et al. 
(1974), Descriptions of the various kinds of data input to the formulas are 
given by Office of Revenue Sharing (1973 et seq.). 

The calculation of general revenue sharing allocations includes four 
major steps. First, allocations to the 51 state areas (the 50 states and the 
District of Columbia) are determined. Second, each state area’s amount is 
split into two shares, a state government share and a statewide local 
government share. Third, each statewide local government share is parti- 
tioned among all county areas. Fourth, each local jurisdiction’s allotment 
is calculated from the total available for the county area containing the 
jurisdiction. Local jurisdictions include county governments, township 
governments, Indian tribal councils, Alaskan native villages, and the 
governments of municipalities and places. The presence of maximum and 
minimum constraints causes the second, third, and fourth stages to be 
performed several times. 

Two formulas are used to determine the allocations to state areas: a 
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“5-factor” and a “3-factor” formula. Because the “5-factor” and 
“3-factor” formulas originated with the House and Senate, respectively, 
they are also referred to as the House and Senate formulas. The allocation 
to state area i is proportional to the larger of the House amount and the 
Senate amount S, given by 
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where 

Pi population; 

Ui urbanized population; 

Cl per capita income; 

/, income tax amount, which is the median of three values: O.OIX/, 
O.iSKi, andO.ObX,-; 

Li federal individual income tax liabilities; 

Ki state individual income tax collections; 

T, net state and local tax collections; 

El tax effort, equal to Ti/Ri’, 

Ri total personal income. 

These represent data elements provided by the Bureau of the Census and 
other agencies of the U.S. Department of Commerce; the subscript plus 
sign signifies summation over the subscript. For example, P+ = ^iPi, 
(P/C)+ = LiiPi/Ci), {ET)+ = LiEiTi, and(P£/C)+ = LiPiEi/Ci. 

The fractions 35/159 and 27/159 result from the fact that the legisla- 
tion dictates that the allocation is the amount to which the state would be 
entitled if one third of $3.5 billion were allocated among states on the 
basis of population (Pi/P ^ ), urbanized population (17//£/+ ), and popula- 
tion inversely weighted for per capita income ((P, /C,)/(X/C)+ ) and if one 
half of $1.8 billion were allocated among states on the basis of each of in- 
come tax collection (/, //+ ) and general tax effort (X', r, /(£'X)+ ). When it 
is possible, simplified but correct statements of the formulas are pre- 
sented. 

State area i's portion of the total grs funds allocated for an entitlement 
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period can be written as Xi/X + , where X,- is the maximum of S,- or Hi A 
The size of the total amount allocated was essentially fixed for entitlement 
periods beginning before January 1, 1977. For later entitlement periods 
the total allocation size is determined on the basis of federal individual in- 
come tax collections to a maximum of $6,85 billion per year; the total allo- 
cations in these later entitlement periods have in fact been at the maxi- 
mum. 

The allocation to each state area is divided, in the ratio of approxi- 
mately 1:2, between the state government and all local governments in the 
state.^ The allocation to all local governments is called the “local share.” 

The local share is then divided among all county areas proportionally by 
the product of the county area’s tax effort and population divided by per 
capita income. The tax effort of a county area is defined as the ratio of all 
“adjusted” (nonschool) taxes collected by the county and subcounty gov- 
ernments to the product of the county’s population and per capita income. 
Observe that the population factors cancel, so that the proportion of the 
local share going to county area j is 

m, + Ait)/(fivC,)}P,v / M ^ D, + fivH- /_1_\ 

Cij \ Gi J \ G\ J 

where 

Dij adjusted taxes of county government j; 

Dijf. adjusted taxes of subcounty government k in county j {Dj+ — 

Pfj population of county area J; 

Cij per capita income of county area J; 

G, IjiiDij f Dij+)/(Cijy]. 


' For Alaska and Hawaii the procedure is somewhat different. In order to account for 
generally higher price levels, “noncontiguous State adjustment factors,’’ say . ^Hawaii i 

are determined on the basis of the percentage of basic pay received by federal employees in 
those states as an allowance under Section 5941 of Title 5, U.S. Code. For entitlement 
periods 1-9 the factors were 1.25 for Alaska and 1.15 for Hawaii. For entitlement period 10 
the factor for Hawaii increased to 1.175, while the factor for Alaska remained at 1.25. For i 
denoting Alaska and Hawaii the final state allocation is increased by the fraction F, — 1. 
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Although population does not enter explicitly in (E3), the county area 
allocations are constrained by maximum and minimum provisions so that 
population does determine some substate allocations. No county area allo- 
cation, on a per capita basis, is permitted to exceed 145 percent or fall 
below 20 percent of two thirds of the state area allocation, on a per capita 
basis. The proportion of the total local share allocated to a county area so 
constrained is thus equal to 1.45 (or 0.20) times the proportion of the state 
area population residing in the county area, Py/Pi- 

The partitioning of a county area allocation among local jurisdictions in 
the county takes place in several stages. Each Indian tribe or Alaskan 
native village with members residing in the county is allocated a fraction 
of the county area allocation equal to its proportion of the county area 
population. Next, the remainder of the county area allocation is parti- 
tioned proportionally among the county government, the ensemble of all 
township governments (if any), and the ensemble of all place and munici- 
pality governments by the respective amounts of adjusted taxes of the 
three types of governments. The total for township governments is then 
allocated among the individual townships so that the share for township 
k' is 


Djjk' 1 

Gy ’ 


(E4) 


where Cy)^- is the per capita income of township A:' and Gy equals the sum 
over all townships k of Dyf^/{.Cy)J^. The total for all place and municipal- 
ity governments is partitioned analogously. 

The following maximum and minimum provisions apply to all local 
governments: 

1. No local government’s allocation may exceed, on a yearly basis, 50 
percent of the sum of its revenue from adjusted taxes and intergovern- 
mental transfers. 

2. No local government may receive, on a per capita basis, more than 
145 percent or less than 20 percent of two thirds of the state area alloca- 
tion, on a per capita basis. 

3. Any local government allocation of less than $200 a year shall be for- 
feited by the locality and given to the county government. 

The population of a subcounty area — like that of a county— does not 
enter into the formula unless provision 2 applies. The number of sub- 
county jurisdictions affected by the constraints is indicated in Table E-1. 
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TABLE E-1 Data Used to Determine General Revenue Sharing 
Allocations, by Size of Place, Entitlement Period 6 (1975-1976) 



Number of Subcounty Jurisdictions 


Effective Constraint 

Population 
to 2,499 

Population 

2,500- 

9,999 

Population 
10,000 -b 

Total 

None 

16,111 

3,765 

2,154 

22,030 

Population" 

9,330 

1,809 

686 

11,825 

Adjusted taxes and intergovern- 
mental transfers of revenue* 

1,301 

253 

172 

1,726 

Below $200 minimum payment 

564 

0 

0 

564 

TOTAL 

27,306 

5,827 

3,012 

36,145 


“Allocations at the 145- or 20-percent constraint. 
^Allocations at the 50-percent constraint. 


SOURCE: Calculations provided by the Data and Demography Division of the Office of 
Revenue Sharing. 


Note that less than one third of the units’ allocations were affected by 
population. 

The sequence in which the maximuni and minimum provisions are ap- 
plied is important. The algorithm that calculates the allocations is compli- 
cated and iterative and is not described here. The law provides some flexi- 
bility for determining the allocations. If the Secretary of the Treasury 
decides that the data mentioned above will not “provide for equitable allo- 
cations,” the Secretary may “use such additional data (including data 
based on estimates) as may be provided for in the regulations” (P.L. 
92-512 Section 109(a)(7)). The law does not require that current estimates 
of population, income, or other parameters be produced, but only that if 
they are produced, they should be used. 

The statute also allows the states to choose among several alternative in- 
trastate allocation formulas (P.L. 92-512, Section 108(c)). To date, no 
states have chosen this option. Section 108(b)(5) authorizes the Secretary 
to determine the subcounty allocations to places, municipalities, and 
townships of populations not greater than 500 solely on the basis of the 
fraction of county population residing in the subcounty unit. Such very 
small places would thus be treated analogously to Alaskan native villages 
and Indian tribes. No measures of per capita income, adjusted taxes, or 
intergovernmental transfers of revenue would be needed for those small 
areas. 
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EFFECTS OF DATA ERRORS ON GENERAL 
REVENUE SHARING ALLOCATIONS 

The effects of data errors on grs allocations have generated much atten- 
tion and misunderstanding. It should be noted that differential errors 
rather than uniform errors in data distort the allocations. Uniform errors 
do not distort the allocation because data elements in the grs formulas 
appear not as totals but as proportions of larger area totals.^ For example, 
state population figures enter only as fractions of the national population. 
Thus uniform relative errors in the population estimates are unimportant. 
Similarly, underestimating all per capita income in the nation by the same 
proportion causes no errors in the determination of grs allocations. 

Differential errors, however, are important. If per capita income is 
underestimated in one state and perfectly estimated elsewhere, then that 
one state’s allocation will be too high and the allocations to the rest of the 
states will be too low (because the total allocation is fixed). Similarly, if 
per capita income is underestimated in one county (or subcounty unit 
within a county) and perfectly estimated elsewhere within the state (or 
county), then that one county’s (or subcounty unit’s) allocation will be too 
high and the allocations to the rest of the counties in the state (or uncon- 
strained subcounty units in the county) will be too low. 

The role of population data in the allocation process is often misunder- 
stood. In practice, estimates of population are quite irrelevant to the allo- 
cations for most local areas. Only areas subject to the 20-percent or 
145-percent constraint are directly affected by errors in estimates of their 
population.'^ As Table E-1 shows, population estimates enter directly into 
the calculation of the allocation for less than one third of the subcounty 
areas. For these areas, local population is the most important element in 
the calculation of the allocation. Slightly less than two-thirds of the sub- 
county jurisdictions receive funds roughly in proportion to the ratio of 
their net nonschool tax revenues to the square of their per capita income, 
divided by the sum of these ratios over all townships or municipalities in 
the county (see (E4)). Thus a given percent error in the population or the 
per capita income estimate for a locality is more significant if the locality 
is unconstrained (so that per capita income is important) than if the local- 
ity is at the 20-percent or 145-percent constraint (so that population is im- 

^ The sole exception is the application of the 50-percent constraint, which limits a substate 
government’s allocation to no more than one-half the sum of its adjusted taxes and its net 
receipts of intergovernmental transfers of revenue. 

These areas include those whose allocations were not actually affected by the constraints 
but that would have been affected if there were no errors in the population estimates. The 
number of such areas is probably not large. 
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portant), because per capita income is squared, while population is not. 
(This point is dealt with more explicitly below.) 

The hierarchical structure of the grs formula insulates the effects of er- 
rors in data for different geographic levels. “Hierarchical” refers to the se- 
quential determination of allocations to different geographic levels: first 
the total pie is divided among the state areas, then each state area’s alloca- 
tion is divided among the county areas, and then each county area’s allo- 
cation is divided among the subcounty units. Thus errors in substate data 
of one state have no effect on allocations within another state. Similarly, 
errors in the per capita income estimates for units within one county cause 
no errors in the allocations within other counties.^ 

These aspects of the grs program are most fortuitous for the Census 
Bureau. Since errors in substate data for one state do not affect the data 
or allocations in any other state, the Census Bureau is free to use different 
methods for substate areas in different states. The Census Bureau takes 
partial but not full advantage of this situation. Thus the Bureau uses dif- 
ferent kinds of data for the ratio- correlation method estimates of county 
populations in different states; it also uses locally prepared estimates of 
county and subcounty populations in some states but not in others. There 
is no statistical reason for the Census Bureau to use the same methods to 
estimate the characteristics of counties or subcounty units in different 
states, (For discussion of uniformity of procedures, see section 5.2b of the 
report.) 

For detailed understanding of how data errors affect the grs allocations 
it is useful to examine formulas that explicitly relate errors in data to er- 
rors in allocation. Because the formulas typically are complicated we 
restrict our attention to two examples; see Spencer (1980) and Robinson 
and Siegel (1979) for more development. The first example illustrates the 
effect of error in subcounty per capita income estimates on allocations to 
unconstrained subcounty jurisdictions. The second example analyzes the 
effect of error in subcounty population estimates on allocations to sub- 
county jurisdictions whose allocations are determined according to popu- 
lation data. These jurisdictions are those at the 20-percent or 145-percent 
constraint. 

EXAMPLE 1 

The portion of the county area share allocated to a township (or munici- 
pality) i within the county, but not at a 145-, 20- , or 50-percent constraint, 

^Here we ignore the possibility that errors in a locality’s per capita income estimate might 
cause its allocation to become constrained (or unconstrained) while it would not be if there 
were no error in the per capita income estimate. The number of areas in which this occurs is 
not believed to be large. 
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is calculated to be proportional to the adjusted taxes for township (or 
municipality) i divided by the square of the per capita income for township 
(or municipality) i. The relative error in this share is approximately^ 


{di -d)- 2{Ci - c), (E5) 

where di is the relative error of the estimate of adjusted taxes, c/ is the rela- 
tive error of the estimate of per capita income, and d and c are weighted 
averages of the relative errors: 

d = ^^jdj c = SwyC/ 

(L denotes summation over all townships (or municipalities) j in the 
county). For an arbitrary township (or municipality) k, the weights are 
defined as follows: 


w* = {D,/C,)ViLDj/C/), 

where and Q denote the actual adjusted taxes and per capita income 
of township (or municipality) k. 

Note that uniform relative errors in the subcounty estimates cancel: if x 
is added to each dj or Cj , then x is also added to d or c , so that the relative 
error in the share of the county area allocation (equation (E5)) is unaf- 
fected. Differential errors, such as di — d or c, — c, are what matters. 
(This result does not depend on the approximation leading to (E5); see 
(E4).) 

Note that errors in subcounty estimates within one county do not affect 
the distribution of the county area allocation for another county because 
the relative errors and weights discussed above all pertain to jurisdictions 
within one county. 

EXAMPLE 2 

If a county or subcounty unit i is at the 145-percent (or 20-percent) con- 
straint, then its proportionate share of the total substate allocation equals 
1.45 (or 0.20) times the fraction its population is of the total of all sub- 
county areas in the state. The relative error in this share is approximately 

t^iPi - p), (E6) 


*See Spencer (1980) for derivation. 
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wherein = 1.45 (or 0.20), pi is the relative error of the population esti- 
mate for subcounty unit i, and p is the relative error of the estimate of 
total population in the state. As before, note that uniform relative errors 
are irrelevant, for if and p are both increased by x, the relative error 
(E6) is unchanged. Differential errors — p are important. 

Unlike per capita income, subcounty population estimates within one 
county can affect the allocations to subcounty units within other counties 
in part because of the complicated manner in which the constraints are 
implemented. Roughly, the allocations to all county and subcounty areas 
at the 145- or 20-percent constraint are made first, then the remainder is 
distributed to unconstrained county areas on the basis of adjusted taxes 
and per capita income data, and then the allocations to unconstrained 
places within county areas are made. Also, the allocation to an area con- 
strained at the 145- or 20-percent level is determined by the ratio of its 
population to the total population of all grs areas in the state, so errors in 
the estimate for any one of the local areas in the state can affect the alloca- 
tion to any other area subject to a 145- or 20-percent constraint. For ex- 
ample, if the population estimate for a local area subject to a 145- or 
20-percent constraint is too low, then the allocation to that area will be too 
low, and the error in the allocation will be distributed to areas (county and 
subcounty) not at the 145- or 20-percent constraints. 

Formulas similar to (El) and (E2) can be derived for allocations to all 
levels of geography and for all the various constrained or unconstrained 
situations, but for reasons of space such formulas are not presented here. 
Such formulas are invaluable not only for insight but for detailed analysis. 
Approximate biases and variances of allocations can be calculated from 
such formulas and from estimates of the biases and variances of the 
various data elements. Using those approximations, one can analyze how 
errors in data can be expected to affect errors in allocations; for example, 
one can construct confidence intervals for the allocations. The alternative 
to a stochastic analysis (Spencer, 1980), as described above, is to use 
simulations to study the effects of data error on the allocations (Siegel, 
1975; Siegel et al., 1977; Stanford Research Institute, 1974; Strauss and 
Harkins, 1974) (the latter two studies consider only the effects of popula- 
tion undercount). In the simulation approach, differences between alloca- 
tions under alternative sets of data are studied. 

In summary, errors in state-level data elements have substantial impact 
on substate allocations. Since substate data are used merely to divide a 
state’s allocation, any error in the allocation to the state must be borne by 
ail the substate units. Analysis by Siegel (1975) shows that at the state 
level, errors in the censal estimates of per capita income are more signifi- 
cant (i.e., cause more dollars to be misallocated) than those in the popula- 



lion esiimaies. oince errors in me esiimaies oi posieensai enangc are 
worse for per capita income than population, this same relationship holds 
for postcensal estimates of population and income. Spencer’s (1980) work 
suggests that per capita income errors are also more significant (i.e., 
cause more dollars to be misallocated) than population at the substate 
level. Since the allocations to substate units are more often based on per 
capita income than on population, errors in per capita income estimates 
also substantially affect more jurisdictions than do errors in population 
estimates. 


REFERENCES 

Bowditch, B., Horowitz, L., Jones, T., Pash, J., and Yates, J. (1974) Overview of Distribu- 
tion of Revenue Sharing Funds. Rockville, Md.: Westat, Inc. 

Nathan, R. P., Manvel, A. D., and Calkins, S. E. (1975) Monitoring Revenue Sharing. 
Washington, D.C.: The Brookings Institution. 

Office of Revenue Sharing (1973 et seq.) General Revenue Sharing. Final Data Elements. 
Washington, D.C.: U.S. Department of the Treasury. 

Robinson, J. G., and Siegel, J. S. (1979) Illustrative assessment of the impact of census un- 
derremuneration and income underreporting on revenue sharing allocations at the local 
level. 1979 Proceedings of the Social Statistics Section of the American Statistical Associa- 
tion. Washington, D.C.: American Statistical Association. 

Siegel, J. S. (1975) Coverage of Population in the 1970 Census and Some Implications for 
Public Programs. Bureau of the Census, Current Population Reports, Series P-23, No. 56. 
Washington, D.C.: U.S. Department of Commerce. 

Siegel, J. S., Passel, J. S., Rives, N. W., Jr., and Robinson, J. G. (1977) Developmental 
Estimates of the Coverage of the Population of States in the 1970 Census: Demographic 
Analysis. Bureau of the Census, Current Population Reports, Series P-23, No. 65. 
Washington, D.C.: U.S. Department of Commerce. 

Spencer, B. D. (1980) Benefit-Cost Analysis of Data Used to Allocate Funds. Lecture Notes 
in Statistics 3. New York: Springer-Verlag. 

Stanford Research Institute (1974) General Revenue Sharing Data Study. Vol. Ill, Vol. IV. 
Menlo Park, Calif.: Stanford Research Institute. 

Strauss, R. P., and Harkins, P. B. (1974) The impact of population undercounts on General 
Revenue Sharing allocations in New Jersey and Virginia. National Tax Journal 
XXVII:61 7-624. 

U.S. Congress, Joint Committee on Internal Revenue Taxation (1973) General Explanation 
of the State and Local Fiscal Assistance Act and the Federal-State Tax Collection Act of 
1972. Washington, D.C.: U.S. Government Printing Office. 



appendix 

F 


A Note on the 
Use of Postcensal 
Population Estimates 
in Employment and 
Unemployment Measures 

BRUCE D. SPENCER 


Postcensal estimates of population figure prominently in official measures 
of employment and unemployment for subnational areas. Their role is 
sketched here*; for further details, see Bureau of Labor Statistics (1977). 

It is convenient to define and distinguish between unemployment rates 
and unemployment ratios: the unemployment rate is the number of un- 
employed people divided by the sum of the number of employed and un- 
employed people; the unemployment ratio is the number of unemployed 
people divided by the population of working age; employment rates and 
ratios are defined analogously. Note that the sum of the employment rate 
and the unemployment rate for an area equals 1 . 

To estimate unemployment rates for states, the Bureau of Labor Statis- 
tics uses data from the Current Population Survey (cps), state unemploy- 
ment insurance (ui) records, and the decennial census. For the 10 largest 
states (and for New York City and Los Angeles Standard Metropolitan 
Statistical Areas) the unemployment rates are estimated directly on the 
basis of CPS data. For the other states, unemployment rates are estimated 
with information from the cps, from the decennial census, from tax 
reports of employers covered by the ui program, and from other sources. 
The BLS combines these diverse kinds of data according to complicated 
procedures, including the so-called “handbook” or “70-step” method; see 
Goldstein (1978) and National Commission on Employment and Unem- 
ployment Statistics (1979) for further description and references. 

*The purpose of this brief exposition is to illustrate some uses of the postcensal population 
estimates and not to discuss estimation of labor force parameters; for some alternative ways 
of estimating unemployment for local areas, see Gonzalez and Hoza (1978). 
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To estimate the total number of employed and unemployed persons for 
the current month in each state, the Census Bureau multiplies the respec- 
tive sample ratio estimate from the cps by a population control. This con- 
trol is derived by extrapolation from the most recent July 1 postcensal 
population estimate for the civilian working-age population. 

The Bureau of Labor Statistics estimates the number of employed and 
unemployed people in each labor market area (lma) in a state by pro- 
rating the numbers for the state in proportion to the handbook estimates 
of employment and unemployment for the lma. An lma generally consists 
of a central city or cities and surrounding territory within commuting 
distance. Each lm^ comprises an integral number of counties (except in 
New England, where it comprises an integral number of towns). 

For many lma’s, estimated total employment is allocated among the 
counties in the lma in proportion to the postcensal estimates of the total 
population of the counties. But estimated total unemployment within an 
lma is allocated among constituent counties on the basis of data other 
than postcensal population estimates. Total employment for many incor- 
porated places of at least 2,500 population is estimated to equal the 
county employment times the ratio of the postcensal population estimate 
for the place to the population estimate for the county. Total unemploy- 
ment for the same incorporated places is estimated on the basis of data 
other than postcensal population estimates. For some incorporated 
places, mainly those that are newly incorporated or have changed their 
boundaries, both employment and unemployment are derived by prora- 
tion of county totals on the basis of postcensal population estimates. Since 
the unemployment rates for many counties and places are estimated by 
the ratio of the estimated total unemployment to the sum of the total 
employment and unemployment, the postcensal population estimates for 
counties and for incorporated places of at least 2,500 population affect 
these rates. 
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’* Models for Error 
in Postcensal 
Population Estimates 

BRUCE D. SPENCER 


DUCTION AND CONCLUSIONS 

lodology underlying the postcensal population estimates is com- 
Appendix A), and a useful way to analyze the errors in these esti- 
to construct models incorporating the components of error. Such 
)rovide insight into the ways in which different kinds of errors 
those arising from estimating a migration rate or a censal popula- 
!ct the estimates both of postcensal population and of postcensal 
We focus primarily on the effects of census undercoverage on the 

al estimates; the error structure in the incremental administrative • 1 • 

nethod (ar) estimates is also investigated. 

1 approximations are required because nonlinear functions of 
variables are analyzed and because simplicity in the models is 
An important tool for analyzing the nonlinear functions of ran- 
iables is the delta method (see Bishop et al., 1975, pp. 486ff; 

1968, pp. 339-340; Rao, 1973, pp. 388ff). The notation A — B is 
nean A = 5 + e, where e is a remainder term arising from the 
thod or from a simplifying assumption. The analysis is heuristic, 
nds for the remainder terms are not given, 
igh the present analysis is not complete — not all of the com- 
of error are considered, and only parts of the estimation 
►logy are treated — some conclusions can be drawn (discussed 
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1 . The effect of census undercount on the national population update 
decreases slightly over time.^ The estimate of net national increase is unaf- 
fected by census undercount. 

2. The effect of census undercount on ratio-correlation (rc) estimates 
of state or county postcensal population does not decrease over time. For 
an area experiencing growth the estimate of net increase afforded by the 
ratio-correlation method becomes progressively more affected by census 
undercount as the postcensal interval gets longer. 

3. Net undercoverage in the census also affects subnational estimates of 
net migration made by either the component method II (cm n) or the ad- 
ministrative records method; this effect tends to remain constant over 
time. 

4. The AR method is better used as a multiple-increment updating pro- 
cedure than as a single- increment procedure. That is, to estimate 1975 
population with ar, it is better to estimate separately changes over 
1970-1972 and 1972-1975 and add them than to estimate the change over 
1970-1975 directly. 

Future work could usefully attempt to relax the simplifying assump- 
tions employed in this analysis; develop quantitative estimates of the 
moments of components of error; extend the scope of this analysis, so that 
more methodology is analyzed; and derive bounds for the remainder terms 
arising from approximations. 


NOTATION 

Capital letters with or without subscripts denote parameter values; 
estimates of the parameters are distinguished by a circumflex. Generally, 
the subscripts i, j, and k refer to states, counties, and subcounty areas, 
respectively. The letter t refers to time, measured in years, with t = 0 cor- 
responding to April 1, 1970. 

In particular. 


P{t)[Pit)] 

PimPm 

Pijit)[Piji.t)] 

Pijkit)[Pijkm 


true [estimated] total population of the U.S. at time t\ 
true [estimated] population of state / at time t\ 
true [estimated] population of county j in state i at time t\ 
true [estimated] population of place k in county j in state i 
at time t. 


' By “effect” we mean contribution to the relative error in the parameter (here, total national 
population) being estimated. 
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pit) = Pit) - Pit) 

Piit) = Piit) - Piit) 

Pijit) = Pjjit) ~ Pyit) 

PijkiP) Pijki^) ~ Pijki^)- 

Net undercoverage rates for the censal population estimates are denoted 

by 

A = -piO)/PiO) 

= -PiiO)/P,iO) 

■^ij = -PijiO)/PijiO) 

^ijk = -pijki^VPijki^)- 

Jacob Siegel and colleagues at the Census Bureau have developed 
estimates of A and Ai, but estimates with comparable reliability of ^4^ and 
Aijj^ are not available (see Bureau of the Census, 1977). 

Errors in the estimates of net increase are 

Apit) = pit) ~ piO) = Pit) - PiO) - m - PiO)) 

Apii; t) = p,.(t) - p,(0) = P,it) - P,iO) - iP,(t) - PM 


and Apii,j] t) and Apii,j, k; t) similarly defined. 


A BASIC DECOMPOSITION 

The term “relative error” is applied to the ratio of the error to the appro- 
priate parameter value; for example, pit) /Pit) is the relative error in the 
estimate of total U.S. population. It is often convenient to work with 
relative errors because the algebra is simple and because relative errors for 
different estimates may be comparable even though the parameters under 
estimation are not. 

The error in the postcensal population estimate decomposes into the er- 
ror in the estimate of net increase and the error in the censal estimate: 

pit) = Apit) + p(0). 

Dividing by Pit) yields 

pit) _ Apit) 

Pit) Pit) Pit) 


(Gl) 


(G2) 
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Clearly, the relative error in the postcensal population estimate is the sum 
of the updating error divided by the population size plus the net under- 
coverage rate times the ratio of census population to postcensal popula- 
tion. Similar relations hold for P/(t), Pyit), and Pykit). 


EFFECT OF UNDERCOUNT ON NATIONAL UPDATES 

Relation (G2) suggests that if Pit) increases over time and Apit) is unaf- 
fected by census undercoverage, then as the postcensal interval gets 
longer, the effect of undercount decreases as the population increases (in 
percentage terms) since the last census. In fact, Pit) increases over time, 
and Apit) is not affected by undercoverage in the previous census because 
the estimate of net national increase is essentially derived from reported 
data on births, deaths, and net immigration since the previous census. We 
conclude that the effect of census undercount on the national postcensal 
population estimate decreases over time as the population increases (in 
percentage terms). 


EFFECT OF UNDERCOUNT ON 
SUBNATIONAL UPDATES 

Analogues to (Gl) and (G2) hold for the subnational errors p,(t), Pyit), 
and Pijkit)> However, at the subnational level one cannot conclude that the 
relative errors in the postcensal estimates become progressively less af- 
fected by undercoverage in the previous census. The explanation is to be 
found only partly in the declines in population experienced by some sub- 
national areas (including Rhode Island, New York, Pennsylvania, and the 
District of Columbia; see Bureau of the Census (1979)). The more inter- 
esting fact is that for some methods, differential undercoverage in the cen- 
sus affects the subnational estimates of net increase. 

RATIO-CORRELATION METHOD (RC) 

For the postcensal estimates obtained by the ratio-correlation method 
(rc), the effect of undercoverage does not change over time. Consider the 
RC estimate of postcensal population for county j in state i. Defining the 
actual and estimated shares Xy and Xy by 


, ^ Pyit)/PyiO) ^ _ Pyit)/PyiO) 

" PiWPm " PMvm) ’ 
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where is obtained with the use of a regression equation (see Appendix 
Pij{t) IS estimated by ^ 


Py(t) = AjPi;(o)A(tyPi(0)- 

Application of the delta method to (G4) yields 


(G4) 


^ , Pij(^) , Pi(t) P/(Q) 

p,At) Xij Pj(o) p,(t) piQ) 

where lowercase letters denote errors, e.g., x,j = X,, - Xg. Relation (G5) 
can also be expressed as 


Pijit) 

Pijit) 


Pi^^^ A 4- A 


(G6) 


Comparing (G6) with (G2), notice that in (G6), unlike (G2), the coeffi- 
cients of the undercoverage terms A{ and Ay do not change over time. 

To see the effect of census undercoverage on the rc estimates of net in- 
crease, one can subtract Pij(0)/Py(t) from both sides of (G5) and rear- 
range terms to obtain 


^ I Py{t)-Py{0} Piit)-Pm 

Pijit) Xy Pft) Py(0) " P,(0) 

(G7) 


If the proportional growth for the county equals that for the state, say, 


^ Py(t) - Py(0) _ P,(t) - P,(0) 
Py(t) Pi(t) 


then (G7) becomes 


t) 

Pijit) 


Xy Apji-, t) 
Xy Piit) 


+ \iAi — Ay). 


(G8) 


Relations (G7) and (G8) show the effect of undercoverage in the census on 
the RC estimates of net increase. In fact, if population growth is significant 
(because the annual rate of growth is high or because the time interval t is 
long), then X can increase, and undercoverage in the census can have a 
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progressively greater effect on the estimates of net increase. Similar rela- 
tionships hold for state estimates. 


COMPONENT METHOD II AND ADMINISTRATIVE RECORD METHOD 
ESTIMATES OF NET INMIGRATION 

This section considers estimation of net migration for a county by either 
the component method II (cm ii) or the administrative records (ar) 
method. Because these methods treat the population over age 65 sepa- 
rately, it is convenient to use the term “elderly” to refer to any person born 
at least 65 years before the estimate date; any person who is not elderly is 
called “young.” Let the actual and estimated net migration rates over the 
postcensal period be denoted hyR/j and Ry for county j in state i. The ac- 
tual and estimated net numbers of migrants to state i (to county j in state 
i) over the postcensal period are represented by M,- and M,- (M,^ and M(/). 
Also, let Yi and T, (F^- and denote the actual and estimated young 
populations in state i (county j in state 0 on the census date. (The dif- 
ference F — F equals the undercount.) 

It is convenient to assume that /?,y, M,-, and M,- are all nonzero, that 
there is no international migration, and that there are no group quarters 
or military populations. We also assume, for illustrative purposes, that ac- 
tual and estimated net natural increases are zero. 

The estimates i?,y are obtained from the matching of tax returns (in ar) 
or from changing patterns in school enrollments (in cm ii). Because the 
migration estimates for counties are adjusted to state totals in a com- 
plicated way (see Appendix A, sections 3.4e and 3.8), the estimates My do 
not generally equal YiiRy. The “unadjusted” estimates of postcensal 
population, F,y(l + R;j), are scaled by a factor 7/ to equal the state esti- 
mate Yi + Mi : 


. ^ Yj + Mj 
J 

The estimate of net migration for county /, say, is estimated as a residual: 

Mil = Yii{\ + Ri,)^i - Yi, 

_ fail + Rii){Yi + Mj) . 

J 



1 uc actual net numoer oi migrants to county i is given oy 


Mil ~ ^il^il 


_ Yii{\ + Ri,){Yi + Mj) 

LYiP+Rij) 


- r,v. 


Suppose, for simplicity, that the actual and estimated net migration 
rates Rij and are constant over counties, so that 


M;; = 




- r.v 


^ YnMj 

Y; 


and, similarly, that 


Mil = YiiiMi/Yi), 


Using the delta method, we can easily obtain 


Mil - ^ii . (llL _ 

Mil Mi \Yii Yi) 


(G9) 


where lowercase letters represent errors, e.g., rrii = Mi — Mi. Clearly, the 
relative error in the county estimate of net migration arises partly from er- 
ror in the state estimate of net migration and partly from the differential 
(across counties) census undercoverage of the young population. More- 
over, the two components of error are equally important. 

If we now allow the net migration rates and R^ to vary over counties, 
we can in like manner derive 


Y„ + M, Ml ^ V '' r, 





where the weights jF,- and Fn are defined by F,- = Mj/iYi + M,) andF,-/ — 
Mii/{Yii -+• Mil) where the state migration rate and error are F,- — 
MiJYi and /*,• = F,- — F,-. Here the error in the county estimate arises from 
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error in the state estimate of net migration, from a (weighted) unde 
coverage differential for the young populations, and from a (weighted) di 
ferential between the relative error for the county migration rate and th; 
for the statewide average. If the county and the state grow at the san 
rate, the effect of census undercoverage of the estimate of net migratic 
for the county remains constant over time. 

Net natural increase affects the estimates of net inmigration in tvn 
ways: the adjustment factor y,- has a more complicated form, and tf 
“base” population by which the migration rate Rij is multiplied is r 
longer Y^j but rather Yy plus one-half the net natural increase of the your 
population. The presence of group quarters populations or internationj 
migration would also affect the estimates of net inmigration (for the fir 
reason above). However, the methods used to derive (G9) and (GIO) ca 
still be applied to decompose the error in the net inmigration estima 
when such other components of change are taken into account. T1 
decompositions are straightforward but tedious to derive and are not give 
here. 


SINGLE-INCREMENT VERSUS MULTIPLE-INCREMEN 
ADMINISTRATIVE RECORDS ESTIMATES 

The administrative records method is generally used as a multipl 
increment updating procedure. For example, the Census Bureau obtaine 
estimates of 1975 population by separately estimating population changi 
from 1970 to 1972 and 1972 to 1975 and adding the sum of these changi 
to the 1970 censal population estimate. A single-increment method wou 
have estimated the change from 1970 to 1975 in one step. The question > 
whether a multiple- increment updating procedure is superior to a singl 
increment updating procedure is examined analytically below. Tl 
analysis suggests that the multiple-increment updating procedures a 
superior (have smaller bias and variance) to the single-increment pr 
cedures. The analysis is heuristic, however, and utilizes many simplifyii 
assumptions. Future research could examine the sensitivity of the concl 
sion to modifications of the assumptions. In particular, empirical studi 
comparing updates by the two kinds of procedures should be done. 

Let times 0, s, t satisfy Q < s < t, where 0 is the time of the last censu 
t is the time of the current postcensal estimate, and s is the time of an i 
termediate postcensal estimate. The multiple-increment procedu 
estimates population change over [0, t) by separately estimating chang 
over [0, s) and (j, t), while the single- increment procedure estimat 
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deaths, international immigration, and special populations. Thus all 
population change arises from net internal migration. Consider 

P actual population at time 0; 

S actual number of nonmovers (stayers); 

I actual number of inmigrants; 

E actual number of outmigrants; 

M actual net number of inmigrants (M = J — E). 

Notice that P = S + E. Denote the counts of nonraovers, inmigrants, and 
outmigrants provided by the matching of irs tax returns by 5" , , and jE" 

and denote the estimate of Pby P. Now let the random variables C, cj, and 
C£ be defined by 


S'/S = C 
I'/I = C(1 + c/) 
E'/E = C{1 + C£). 


Here C is the coverage ratio for nonmovers and C/ and C£ are the relative 
deviations of the coverage ratios of inmigrants and outmigrants from C. 
Let /? = P — P be the error in P. 

Now consider the ar estimate M of M, 


M = 


/' -E' 


S' + E 


tP^ 


(Gil) 


and note that 


^ \ + p/P) 

sc + ECil + Cjr) 


_ M -{■ Icj ~ Ec£ 


P(1 + Ec£/P) 
Application of the delta method gives 
M - M 


P(1 -1- p/P). 


M 


= cjI/M - ceU + M/P)E/M + p/P. 


(G12) 


For simplicity of analysis we assume that Cj, C£, andp are mutually uncor- 
related. The relative variance of M is thus given by 
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(i) ^ {mJ + (i)' 

where ap-, a^, and op are the variances of C/, c^-, and p, respectively. The 
relative variance of the postcensal population estimates P + M is given by 


Var 


'P + 


P + M 


P + M 


op + 


P + M 


(1 + M/Ppai + ap2. 


(G14) 


We can now compare the variances of the single-increment and double- 
increment procedures. Let subscripts Os, Ot, and st refer to the ends of the 
intervals [0, s), [0, t), and [s, t); for example, M^t denotes the actual net 
number of migrants over the period [ 5 , t). Similarly, o/P and 0 £p are the 
variances of Cj and C£ for matches of tax returns between times s and t. 
The relative variances of the estimates of population at times 0 and s are 
written opP and opp. We assume that and are not correlated with 

A. 

It follows from (G14) that the relative variance of the single-increment 
update is given by 

y^j. ^ ^^^2 -1- (1 + + opp^ 

(G15) 

To derive the variance of the double-increment update, we first note that 
the relative variance of P^ is 

O'p/ - (^) + 0pp‘ (G16) 

As in (G13) the relative variance of M^, is 

(G17) 


and the relative variance of the double-increment update is thus 
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Var 


+ M, 
Pt 


^si 


Pt 




E \2 
PjL 


Pt 


(1 + Mst/PsfoEs? + 


(G18) 


The variance of the single-increment update minus that of the double- 
increment update is approximately 


+ {Eq^KI + Mo,/Po)^a^of - 

- (1 + Mo,/Po)^W - iPstm + 

(G19) 

For simplicity, suppose there is no outmigration, that inmigration is 
linear over time, and that P is large in comparison to /; i.e., 0 = £’ 0 ^ = 
pQe ~ Estt ht ~ th hs ~ 1st — ~ and P/Ps is approximately 

unity. Then (G19) is approximately equal to 


P{t^(Tiot^ - s^afos -it- . (G20) 

For many specifications of (^ios^> and expression (G20) will be 

positive, indicating that the double-increment procedure has (subject to 
the assumptions above) smaller variance. For example, if has the 
form a + b(t — sy for positive constants a and d and nonnegative b, then 
(G20) is positive. 

The biases in the two procedures can be analyzed in similar fashion. 
The analog of (G20) for biases is 


^(tf^IOt ^l^lOs it ^)y-ht)i (G21) 

where fxi'is the mean of Cj. If ixj^t is constant over possible values of s and t, 
then (G21) is zero, so the biases in the single- and multiple-increment up- 
dates are approximately the same. If has the form a -¥ b{t — sY, 
where b and d are positive (negative) constants, then the biases in both up- 
dating procedures will be positive (negative), but the absolute bias in the 
single-increment update will be larger. Generally, the absolute bias of the 
single- increment update is believed to be at least as large as that of the 
double-increment iindate. 



some of the simplifying assumptions used, especially that of the lack of 
covariances. It would be especially useful to compare the conclusions of 
this analysis with the results of empirical tests of accuracy of the two kinds 
of updates. 
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Evaluation Design and the 
Use of Low-Precision 
Benchmarks 

CARL N. MORRIS 


If one is fortunate enough to know the “true values,” say, di, . . . , of 
some parameters in n areas, then one can use those values to evaluate 
alternative estimators of them. If one of the estimators takes the values 
Y = (Fj , , . . , F„), then a measure of the accuracy of Y is 


L{6,Y)^i{Yi~dd^. 

1 

Other loss functions are possible, of course, as was noted in section 3.1 of 
the report. For example, it may be appropriate to weight the squares in 
the above formula to reflect the size of an area, etc., but these points are 
ignored here in order to illustrate the issues of interest. 

Usually, 0 is unobservable, but instead one has ^i*, . . • , 6,*^ which are 
other estimates of 6 n and have their own variances. Assume that, 

independently, 


di* ~ /•= 1 , 

If Ui == 0, then 0,* = but otherwise, 6^* is unbiased for with variance 
Vi > 0. If Vi is small in comparison to the variance of F,-, one might call 5,-* 
a “high-precision” estimate or benchmark, while larger values of j',- would 
have “low-precision” (high variance). 

So long as 0,* is independent of Y, then given Y, 

EX(^*, Y) = E E(F/ - 0,-*)^) Yi = £ [{Yi - di)^ + Vi] 
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SO that one cannot estimate the true loss for Y by substituting 6,* (or dj) in 
L(6, Y). Instead one might use the unbiased estimator 

L*{e*,Y) = t {{Yi - - Vi} 


of Lid, Y). 

How else must we account for the Vi? Two strategies often are followed 
for areas with large Vi- Strategy 1 ignores all areas in the evaluation with vi 
exceeding some threshold; strategy 2 uses the areas with large v/ by 
clustering them so that the combination of areas has sufficiently small 
variance. I wish to observe that neither of these strategies is optimal, 
although each may be convenient and appropriate at times. 

Strategy 1 violates the principle of sufficiency. Additional information 
can always be used profitably, even if the information is quite imperfect. 
In this case, ignoring areas with low-precision benchmarks is inferior to 
using them, but with lower weight. The appropriate weights depend on 
the variability of the K, as well as the V/. 

For example, suppose v,- = Vi for f = 1, . . . , Ni and v,- = V 2 > for 
i i, . . . , Ni + N 2 . Then the first Ni areas are (relatively) high 

precision, the last N 2 are low precision, and Ni N 2 — n areas that are 
available for evaluation. Suppose Y,- ~ N(0/, W) with W unknown. Then 
W measures the precision of F,-, and if 0), . . 6„ were known (Vi = 
V 2 = 0), one would compute 

L{e,Y) = tiYi-ei)^ 

which estimates nW. When the 6i* must be used, it is better to use the 
mixed estimator 

+ f: Vji 

Ni 1 N2 A'l + i 

to estimate W than to use (l/Wi)E[(F,- — 0,*)2 — Fj] alone, provided one 
chooses a to be the optimal value: 


— Q2/{Qi "t" Q2)> 


Qi 


1 Ni 

Var— D [(F,- 
Ni 1 




where 
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and 


\ Ni 2 

(VK must be known to compute this, but the maximum likelihood estimate 
of W can be used in this manner and causes no undue complication.) 
Thus the low-precision estimates are as useful as the high-precision 
estimates, provided 


N2 = 


/WjEVfY 

vj 


N 


1 > 


and the relative efficiency of a low-precision estimate in this example is 


Effic = 


V2J ■ 


Note that if V 2 is small with respect to W, low-precision estimates are 
nearly as good as high-precision ones. We usually expect Vj to be con- 
siderably less than W and V 2 to he on the order of W. Then low-precision 
estimates would have about one-fourth the efficiency of high-precision 
estimates, but the value of such information cannot be denied. More 
general examples can be constructed. The main point is that strategy 1, 
which ignores low-precision estimates, can be costly, especially if such 
estimates outnumber high-precision estimates. 

The second evaluation strategy, strategy 2, pools several low-precision 
estimates to produce fewer high-precision ones. This method is efficient if 
areas having the same mean are grouped. Otherwise, serious biases within 
these groupings will go undetected because only the average performance 
of the various Yi associated with the groupings will be evaluated. Low- 
precision estimates may frequently correspond to small areas, so willing- 
ness to use low-precision benchmarks may provide genuine benefits for 
evaluation of small-area estimates. 

The comments made in this section fall under the general heading of 
developing good evaluation strategies. The Panel has recommended that 
the Bureau of the Census evaluate its procedures whenever possible. The 
Census Bureau should consider carefully its methods for making evalua- 
tions. fnr rbis k nnf a w(»ll-rharted terrain, and the utilitv of an evaluation 
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Effect of Biases 
in Census Estimates 
on Evaluation of 
Postcensal Estimates 

BRUCE D. SPENCER 


Error in the decennial and special census estimates of population and in- 
come confounds the evaluation of both estimates of postcensal level and of 
postcensal change. This error arises from net census undercoverage and 
from underreporting of income. This appendix focuses on the effects of 
undercoverage bias on the population estimates, although analysis of 
biases of income estimates would be similar. In particular, this appendix 
delineates precisely how undercount affects the evaluations based on 
decennial or special censuses: 

1. Use of the difference between the postcensal estimate and the 
(special) census count to estimate the error in the former underestimates 
this error because it ignores the bias in the special census counts. 

2. Use of the difference between the postcensal estimate and the 
(special) census count to estimate the error in the estimate of postcensal 
change is also affected by undercoverage, but to a lesser degree. Even if 
the undercoverage rates for the base-year census (at the beginning of the 
postcensal period) and the (special) census used for evaluation are the 
same, the undercoverage will affect the evaluations. 

Consider estimation for an arbitrary geographic unit and for time t, 
with 

Pj true value of population; 

P, postcensal estimate of population; 

Pf census estimate (decennial or special) of population. 


232 



233 


Effect of Biases on Evaluation of Postcensal Estimates 

Time ^ = 0 refers to the previous decennial census, and by convention, Pq 
equals Pq. The undercoverage rate a, is defined by 



and the error in the postcensal estimate of net change, Aj, is given by 

A, = A - Po - (Pr - Pol 

Estimates of statewide net undercoverage rates for 1970 range from less 
than zero (estimated net overcount for Wisconsin and Utah) to more than 
0.07 (for New Mexico, Arkansas, and Alaska) (see Bureau of the Census, 
1977, Table VII-D). Of course, substate rates cannot vary Jess than state 
rates. 

The usual evaluation studies of postcensal estimates use the difference 
between the postcensal estimate and a current census estimate, 

Pt-Po (W 

to measure error in either the estimate of postcensal level, Pp or the esti- 
mate of postcensal change, P, — Pq. In general, (II) does not provide a 
good estimate of the error in the estimate of level, 

P^-Pn (12) 

because of undercoverage in P, . To see this, note that (II) can also be ex- 
pressed as 

Pi - Pi -I octPt> 

which shows that the net undercoverage q;,P, may well exceed the actual 
error, (12). 

For example, in section 2.2 of the report, the Panel analyzes the ac- 
curacy of county estimates by using the average over counties of the ab- 
solute relative differences 


\P,-P,l/P, (13) 

between postcensal estimates and special census estimates to support in- 
ferences about the expected value of 
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(14) 


The average of (13) for 133 counties with special censuses taken during 
1974-1976 ranged from 0.039 to 0.064 for different methods (see Table 
2.3). However, one must be guarded in inferring that the mean value of 
(14) lies in this range, because the magnitude of a, is often comparable to 
or larger than (13). In other words, unless the undercoverage rate for an 
area is much smaller than the relative difference, (13), between the esti- 
mate and the census count, the value of (13) will tend to grossly overesti- 
mate the true relative absolute error (14). 

To avoid this problem, one might consider using (II) to estimate A, or 
using (13) to estimate | Aj/P,. In general, this use of (II) or (13) is less sen- 
sitive to the presence of undercoverage than is the use of (II) for 
estimating (12) or the use of (13) for estimating (14). Even if one could as- 
sume that undercoverage is constant over time — that olq and a, are the 
same — however, the use of (II) and (13) for making inferences about A, 
would still in fact be sensitive to the level of undercoverage. Better in- 
ferences about the properties of A, can be made by taking undercoverage 
into consideration. 

Some decompositions of error will be useful. Observe that 

P, - P^ = Af + oitPt — cioPo. (15) 


Letting 


e, — Ptioit - (xq) jt = aoiPi - Pq), 


one obtains 


Pf — P, — A, + 7, + e,. (16) 

Here is the effect of differences between the base year decennial census 
undercoverage rates and the later special (or decennial) census under- 
coverage rates, and 7 ^ is the joint effect of population change and decen- 
nial census undercoverage in the base year. Little is known about how 
undercoverage rates in special censuses compare to those in decennial cen- 
suses. Studies (Bureau of the Census, 1973) indicate that the national 
undercoverage rates for the 1960 and 1970 decennial censuses differed by 
0 . 002 , but how much the rates for subnational areas changed is unknown. 
On the other hand, we do know that 7 t can be large for places that are 
growing or declining substantially and that have moderate-to-large under- 
coverage rates. 
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us now assume = 0 and focus on 7^ . A possible way to improve 
timates of A, is to remove the estimated effect of 7,. Notice that 

A - Po = (1 - Oio){Pt - Pf} - 

t if an estimate do of <xq is available, one can estimate 7^ by 7^, where 
. _ ^oiPt ~ Pq) 

It - — \ 

sophisticated estimates of 7^ could also be developed. Thus the esti- 
l effect of 7^ would be removed if instead of using P^ — P, to estimate 
i used 


Pt-Pt-y. ( 17 ) 

lajor difficulty with 7^ is the inaccuracy of the estimates do- At the 
evel these estimates are questionable, and at substate levels they are 
. Nevertheless, for most places one can be fairly confident that ao is 
/e, say, chq > 0.01. For these places a cautious estimate of 7^ would 


O.OKP, - Pq) 

0.99 

s preferable to the present implicit use of 7, = 0. Alternative ap- 
les could estimate qiq for a substate area by an estimate for the state 
the nation as a whole. Since the population-weighted average of 
coverage rates for substate areas equals the state undercoverage 
1 simple but reasonable approach consists of estimating cxq for a 
te area by the estimate for the whole state. In fact, these estimates 
:ouId be substantial and have a significant effect on evaluations of 
ds. This effect is believed to be greatest when an evaluation com- 
the postcensal estimate against a decennial census count, that is, 
10-year updates are evaluated. 

example, consider evaluating the postcensal estimates for Florida 
es. The estimates of proportional growth over 1970-1976 were more 
(.24 for more than half of Florida’s 77 counties^ Extrapolating, one 
suppose that for t referring to April 1, 1980, P, — Pq > 0.4Po for 
counties. For Florida, do ^ 0.05, so that 5/Pt could be near 0.016. 
alue is large: the difference between the usual measures of accuracy 
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(the average of (13) for all counties) for alternative estimation procedures 
may well be less than 0.016. 

A possible result of correcting for ji is a more realistic sensitivity to 
bias. For example, if for fast growing places an estimating methodology is 
unbiased and has small variance, use of (II) rather than (17) to study the 
errors could lead to inferences that the estimates were biased upward. ^ In 
this case, (II) would be primarily estimating 7 , rather than A^. For 
another example, suppose that for a given class of areas, two methods had 
biases with opposite signs — method A tended to underestimate and 
method B to overestimate. In this example, use of (II) instead of (17) will 
make method A appear better than it really is and will make method B ap- 
pear worse than it really is. 

In making inferences about the relative errors A/Pf one may similarly 
divide (17) by P^/(l — dg) rather than by P,. However, this extra adjust- 
ment for undercoverage will generally have less impact than adjustment 
for 7 ^. In particular, division by 1/(1 — do) will have negligible impact on 
comparisons between methods when do is constant for all places. 

Some empirical study is needed to compare the measures (II) and (17) 
over a range of values. How sensitive to 7 , are the estimates of accuracy 
for different methodologies? Do the rankings of the methodologies 
change? Of course, the accuracy of 7 , adjustments rests on that of the 
estimates of undercover age, but by adjusting for undercoverage (as 
discussed above) one can expect some improvement in evaluation, given 
current knowledge about undercoverage. 

This analysis has focused on population estimates; income estimates 
can be handled similarly by drawing on knowledge of the effects of income 
underreporting and population undercoverage on income estimates. For 
both population and income, understanding how biases in census 
estimates affect the evaluation aids interpretations of the evaluations. 
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'in fact (see section 2.2), it appears that estimates for fast growing places are biased down- 
ward. Because these conclusions are based on the use of (II) rather than (17), it is possible 
that the estimates for fast growing places have even more downward bias than evaluations in- 
dicate. 
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Stabilization by 
Empirical 
Bayes Methods 


CARL N. MORRIS 


The evaluation methods considered in sections 3.2 and 5.2 of the report 
are used to choose different procedures or to determine how to average 
two (or more) procedures. We have recommended that procedures be used 
that “best” predict sample data, for example, from independent Current 
Population Surveys. Sometimes good weights can be determined from 
data without an independent data set for evaluation. Fay and Herriot 
(1979) have identified such an application for the Census Bureau and 
showed the method works well for estimating per capita income in small 
areas in a census year. The method uses empirical Bayes modeling ap- 
proaches to generalize Stein’s estimator appropriately for that application 
(see Efron and Morris, 1975). We discuss the Fay-Herriot application here 
and suggest other census uses of this methodology. We then consider the 
relationship between empirical Bayes weights and weights determined 
from regression methods. 

In a census year, income is measured imperfectly for all areas because it 
is determined from a sample. Assuming that good sampling procedures 
have been followed, we consider the sample mean of income in each area 
as an unbiased estimate of the mean per capita income for the area. Let T,- 
be the sample mean in the fth area. In small areas, even accounting for 
finite population corrections, variances will be quite large. (A census 
would alleviate the problem of large variances, of course, but in 1970 
20-percent samples were used, and the 1980 census will collect data on in- 
come for not more than 50 percent of the population in small areas.) Let 
Vf be the variance of the sample mean Y;. 

Instead of using the sample mean directly the Census Bureau can 
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regress the sample income estimates of small areas on other characteris- 
tics (given by the matrix X) correlated with income (Fay and Herriot take 
these to be irs income and housing values) to derive an income predictor 
for each area, for example, 


Yi = X.'$ (Jl) 

with X,' the /th row of X. This “regression predictor,” being estimated 
from many degrees of freedom, has small variance. But unbiasedness can- 
not be guaranteed, as Fay and Herriot showed for the 1970 census. 

In decennial census years, both “sample estimators” and “regression 
predictors” of the preceding paragraphs are available for all areas. Such is 
not the case in postcensal years and possibly will not be the case in 1985. 
How then, in a census year, should one choose between the unbiased, but 
noisy, “sample estimators,” Y, and the (probably) biased, but low- 
variance, “regression predictors,” F,? 

An empirical Bayes estimator estimates the true mean in the /th area, 
fii = EYi, by 


= + 02 ) 

with Bi = V/(Vi W), W being the variance of about the regression 
surface. W itself may be estimated from the data (see Efron and Morris, 
1975, 1977; Fay and Herriot, 1979). If W is large, Bf is close to zero, so 
that jxi is nearly T,-, the sample mean. Small values of W put almost full 
weight on F,-, the regression estimate. Formula (J2) is a Bayes estimator, 
but since W is estimated and not determined independently of the data, it 
is called empirical Bayes (see Efron and Morris, 1975, 1977; Fay and Her- 
riot, 1979; and below). 

Statisticians have long known how best to average independent un- 
biased estimators: they weight each by its reciprocal variance. With one of 
the two estimators, F,-, being unbiased, statisticians have developed the 
theory, under an assumed model of the error distribution of F,-, for 
estimating the mean squared error of F,-. The resultant estimator weights 
both independent estimators F,- and F, by the reciprocals of their mean 
squared errors. Carried out formally, this procedure results in an em- 
pirical Bayes estimator. It reduces to Stein’s celebrated estimator pro- 
vided, for example, that the sample mean of income is equally variable in 
every area, which would not be true in Census Bureau applications. 

The Panel endorses the work of Fay and Herriot and encourages con- 
tinued use of such methodology for income estimation. We believe the 
method will work in other applications when small-area estimates must be 
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made. The Panel does not seek widespread use of such methods — they do 
not apply in most situations and will not necessarily be beneficial in all ap- 
plicable cases^ — but empirical Bayes methods are likely to improve 
estimators used in a variety of cases. Some possible applications are 
presented briefly below; these ideas are suggested for future research. 

In making the estimates of 1973 subcounty population used for revenue 
sharing, the Census Bureau set migration rates for areas with less than 
1,000 people equal to the county migration rate because the Bureau had 
little faith in the ar estimates of small-area migration rates. An empirical 
Bayes estimator could have been used to produce weighted averages of 
small-area rates and county rates and almost surely would have been 
superior to the county rates used. 

Empirical Bayes methods should also be explored as alternatives to the 
“tolerance check” methods currently used for estimating subcounty 
migration rates (see Appendix A, section 4. Id). In the tolerance check ap- 
proach, if the coverage ratio for tax returns of a subcounty area differs by 
more than a given amount from the coverage ratio for the county, the sub- 
county migration rate is estimated by either the county migration rate or 
the migration rate for a group of subcounty areas. Empirical Bayes 
methods could be used to determine weights for averaging the initial sub- 
county migration rates and the migration rates for the group of subcoun- 
ties. These improved migration estimates could then be used in the ar 
estimates. 

The errors in local area population estimates vary by characteristics or 
“covariates” of the area, such as population size, growth rate, region of 
the county, etc. To control for these covariates when evaluating the ac- 
curacy of the estimates, a common technique uses two-way or higher- 
dimensional cross-classifications of average error by strata of values of 
population size, percent change, etc. (see Tables 2.1-2.11). When the 
number of observations in a cell is small, this “stratification” analysis may 
be unreliable. An alternative approach is to fit linear or other models to 
express the error in the estimate as a linear (or other) function of the 
covariates (percent change, population size, etc.). These covariance 

* For example, as shown below, the resulting estimates of population for the small areas will 
generally be biased. For areas with net migration rate below (above) the countv rate, the bias 
will be positive (negative). In some cases the magnitude of the bias can be large (see Rao and 
Schinozaki, 1978). If the population estimates are used to determine the allocations of funds 
to the areas for successive time periods, areas with small net migration rates (relative to the 
county) get a favorable treatment in the long run at the expense of areas with larger net 
migration rates; and the situation is even more severe when the migration rates for the small 
areas are set equal to the county rate. An evaluation is recommended before adoption of em 
pirical Bayes methods in any particular application, to be sure that improvements will occur. 
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models can aid in understanding how errors tend to vary according to the 
characteristics of an area. However, this practice carries some risk if the 
model specification fits poorly. Empirical Bayes methods could be used to 
shrink the stratification estimates toward estimates produced by the 
covariance model, thereby stabilizing the stratification estimates and 
reducing the risk of the covariance model approach. 

Two different methods of empirically determining weights for averaging 
different estimators have been suggested: 

1. Regression estimators (see section 5.2 of the report) can be used 
when two or more estimators have been proposed if cps or special census 
data are available for a current year. The observations used for the evalua- 
tion must be approximately unbiased, but they need be available only in a 
small portion of all areas. This method works best for determining post- 
censal estimates. 

2. Empirical Bayes estimates can be used to combine unbiased sample 
estimates with regression predictors for the areas. This method can only 
be applied in sampled areas, unlike method 1, but it does not require cps 
or special census data. Its prime purpose is to improve estimates made in 
a census year, as in the Fay-Herriot application. The two methods apply to 
different situations. The Panel has recommended that method 2 also be 
used to stabilize and improve postcensal estimates, but empirical Bayes 
methods do not perform the evaluation function of method 1 . 
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ASSEMBLY OF BEHAVIORAL AND SOCIAL SCIENCES 


210) Constitution Avenue Washington, D. C. 20418 


COMMITTEE ON NATIONAL STATISTICS 


January 9, 1979 


Dr. Bernadine Denning 
Director 

Office of Revenue Sharing 
Department of the Treasury 
2401 E Street, N.W. 

Columbia Plaza High Rise 
Washington, D.C. 20226 

Dear Dr . Denning : 

The Panel on Small-Area Estimates of Population and Income has 
recently been established at the request of the Bureau of the Census 
under the auspices of the National Academy of Sciences. The Panel is in 
the process of reviewing the procedures used by the Bureau of the Census 
to make postcensal estimates of population and income for small areas. 

These estimates are used for the allocation of general revenue sharing 
funds, as well as for other major public purposes, such as health planning. 
Although the study will not be completed until December 1979, the Panel 
is writing to urge that the 1979 IRS income tax returns contain a special 
question to determine exact place of residence, as was included on 1975 
tax returns, for use by the Bureau of the Census. 

The information reported on the tax returns plays an essential role 
in the estimation procedure. By comparing changes in address and income 
of specific individuals in two sets of tax records, the Bureau uses the 
information on the tax returns in its estimation of migration and changes 
in per capita income. The mailing address on the return often is insuf- 
ficient for determining in which unit of local government the filer of 
the return actually resides. A question on residence was asked on the 
1975 IRS returns. It provided the essential information for allocation of 
mailing addresses to the appropriate places of residence and has served 
as the basis for such allocations since then. But localities experience 
different rates of growth, and, in many instances, the use of the 1975 
allocation factors is no longer appropriate. Annexations and boundary 
changes are frequent and, for many places, the allocations based on city 
boundaries as of 1975 are no longer valid. 

There is another important reason for including the place of 
residence question on the 1979 returns. There is a question as to how 
much the migration patterns and rates of change in income differ between 


Tht Nalionfll Research Council if the principal operating agency oj the 
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Bernadine Denning 
uary 9, 1979 
e 2 


populations covered and not covered by tax returns. The proportions 
population covered by tax returns (l.e. either filing or claimed as an 
uption on a return) vary widely from one place to another. In using the 
data to estimate migration and changes in per capita income, the Bureau 
umes that the migration patterns and the rates of change in wage and 
ary income are identical for the populations covered and not covered by 
returns. If the accuracy of the small-area estimates is to be slgnifl- 
tly improved, these operational assumptions need to be evaluated and 
Ified accordingly. 

Because the filing dates for the 1979 tax returns are so close to the 
jnnlal census date of April 1, 1980, a rare opportunity exists to examine 
assumptions by using the 1980 census results to compare the characterls- 
s of the populations covered and not covered by tax returns. The Panel 
‘s that if the residence question is deferred to another year, the 
Lity of the Bureau to examine its assumptions will be restricted. Under 
provisions of Title 13, U.S.C., confidential treatment of the data is 
ired . 


The Panel recommends that the place of residence 
question be included on the 1979 tax returns and 
that funds be sought to enable the Bureau of the 
Census to process the data obtained from the 
question. 

The Panel is fully aware of the efforts to keep the tax form simple 
to minimize the amount of non-tax information called for. We also 
■ize that processing the responses to the question is an expensive 
:atlon. But obtaining and analyzing the responses to the question is 
most practical way to get the needed information. The 95-percent 
lonse rate in 1975 indicates good public cooperation in answering such 
lestion. If the responses to the question are not obtained and analyzed. 
Bureau’s ability to maintain the accuracy of the local estimates for 
1980' s will be impaired and desired improvements will be impeded. 

A similar letter is being sent to the Director of the Bureau of the 
us. We will also send copies of the letter to the Secretary of Commerce 
to the Commissioner of the Internal Revenue Service. 

We would welcome the opportunity for further discussion. 

Sincerely yours, 

Evelyn M. Kitagawa 

Chairman 

Panel on Small-Area Estimates of 
Population and Income 
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